Psychological Bulletin 


| WaynNE DENNIS, Editor 
Brooklyn College 
| Consulting Editors 
i LAUNOR F. CARTER Ropert L. THORNDIKE 
i Rand Corporation Teachers College, Columbia University 
Santa Monica, California 
Twine BENTON J. UNDERWOOD 
ARD GIRDEN Northwestern University 
Brooklyn College a 
Wie c S. Ratns WALLACE . 
TOR C. RAMY Life Insurance Agency 


University of California Management Association 


LORRAINE BouruiLeT,. Managing Editor 


aA i 
\ PUBLISHED. BIMONTHLY BY er? 
THE AMERICAN PSYCHOLOGICAL ASSOCIATION, INC. 
j 1333 SIXTEENTH STREET N.W. . 
i WASHINGTON 6, D. C. g 
{ Copyright, 1956, by the Ameri a ation iner 
i. o pureau Ednal. PS% 
t pi mane ; HAMAS 


« 
-= 
P 
-= 
b.. r 
a > 
pE a ee , 
; ee = is so 
lle 
- 
7 
; 
w f 
-= 
r 
mo aiie 


a a 


CONTENTS OF VOLUME 53 


ANASTASI, ANNE. Intelligence and Family Size....-----+-+++++000000+ pCR 
ARNOULT, M. D. See ATTNEAVE, F. 
S F., AND ARNOULT, M. D. The Quantitative Study of Shape and Pattern Per- 
BINDRA, D., AND WaxsBerG, HÉLÈNE. Methods and Terminology in Studies of Time 
PoE NTH ess w ccs aaia Grate wea ns vena EO ORE HS THM nce sie E NER FAN aea aaia 
BRACKBILL, G. A. Studies of Brain Dysfunction in Schizophrenia.....-- 
Curistiz, R. Eysenck’s Treatment of the Personality of Communists 
as, R. Some Abuses of Psychology... sg «isso screenees nes sa games nesine 
ORSO, J. F. The Neural Quantum Theory of Sensory Discrimination. ......+++e++55 
Day, R. H. Relative Task Difficulty and Transfer of Training in Skilled Performance. . - . 


` Ettrneson, R. J. Brain Waves and Problems of Psychology....----++1-ss7sr esr 55"* 


ie W. K. The Problem of Inference from Curves Based on Group Data.....----- 
asi H. J. The Psychology of Politics: A Reply..--.-+--+-71rt1r tru t 
SENCK, H. J. The Psychology of Politics and the Personality Similarities Between 
ie hs and COR MUMISED 56 iiu ns era san giaasa o eaaa tE EE 
~~ J. L. Theories of Visual Acuity and Their Physiological Bases... +--+++++s00 +" 
ea, E. Behavioral Effects of Ionizing Radiations: sspe sasen oaii i nenea 
NK, G. H. See Guertin, W. H. 

Cane D. See RosentHAL, D. Š ; i 
, N. L., Leavitt, G. S., AND STONE, G. C._The Intermediary Key in the Analysis 


of Interpersonal Perception.......-:+2+:9 9" #3 ET, Sfi ston oe ns TARAR SH 


Goss, A, E., AND WiscHNER, G. J. Vicarious Trial and Error'and Related Behavior. . - 


Detect Acceleration of Target 


bea W. H., Frang, G. H., anD Rasin, A. I. Resear 
elligence Seale: 1950-1955 sree nor sns rað JEA EEren emie tB PUNE ae =i 


d Rosen's Paper... 


RSON, S., AND SELLS, S. B. Comments on Meehl an 


Lp 
BAVIT, GuS, ‘Soe. Gace, N. L. 


E 
NERS BE, Enga amerga ae se A SEA a 
ure of Rigidity.. ean ee 


R 
nda E. The Water-Jar Einstellung Test as a Meas 
YKREN T See SCHAEFFER, M. S. : 
ANN J H T. A Method of Actuarial Pattern Analysis 
J.H. Experimental Evaluations of Role Playing...-eesers 


UEN; 
(VTE) = K. F. On the Origin and Early Use of t 


452 


35 


iv CONTENTS OF VOLUME 53 . 
Razran, G. Backward Conditioning. ..........00.. 000 c cece ceceecvececeecceeeee. 55 


Rokeacu, M., ann Haney, C. Eysenck's Tender-Mindedness Dimension: A Critique.. 169 
Roxeacu, M. See Hantey, C. j 
RosENTHAL, D., AND Frank, J. D. Psychotherapy and the Placebo Effect............ 294 

SALZINGER, K. Techniques for Computing Shift in a Scale of Absolute Judgment...... 394 


SCHAEFFER, M. S., AND Levirr, E. E. Concerning Kendall’s Tau, a Nonparametric Cor- 
maken Coget: cr o ERE voce an cope creesere va cccz fon osic sed Rs 5354953 338 


Stone, G. C. See Gace, N. L. 


TAYLOR, F. K. Remarks Con 


cerning Willerman’s Paper on Kendall's W and Sociometric- 
Type Ranking 


o o A E eyo eacsrnds ary Spa REG B914 Hare Fiviv'ncescscm onns sue ane 108 

Tryon, R. C. See Hirscx, J. 

UNDERWOOD, B. J., AND RICHARDSON, J. Some Verbal Materials for the Study of Concept 
Damaia co oh SAAE A N E A 8t 

VERPLANCK, W. S, The Operant Conditioning of Human Motor Behaviot. s sesasi asa 70 


m 


à 


Vot, 53, No. 1 


January, 1956 


Psychological Bulletin 


BRAIN WAVES AND PROBLEMS OF PSYCHOLOGY! 


ROBERT J. 


ELLINGSON 


Nebraska Psychiatric I nstitute 


University of Nebraska College of Medicine 


Since the war clinical and experi- 
Sper electroencephalography have 
poche, Studies employing EEG 
oe have revealed much about 

ectrophysiological mechanisms as- 
Sociated with normal and abnormal 

rain functions. The purpose of this 
Paper is to summarize recent experi- 
A ln findings and hypotheses aris- 
a rom them, which pertain to prob- 
ms of traditional interest to psy- 
chologists. 
nec chapter on electroencephal- 
(aay ny in Hunt's” 1944 handbook 
) Lindsley reviewed the then- 
nown relationships between EEG 
oe abe and physiological and 
Bee ological processes. A knowledge 
ha fundamentals as presented by 
fe Sley will be assumed in this 
in ree and material which appeared 
bate chapter will not be reviewed 
a Material which does not at 
sent appear to be related to psy- 


c : 5 : 
chological functions will also be 
mitted, 


SLEEP AND WAKEFULNESS 


ain 1939, after an exhaustive re- 
W of the literature on sleep and 
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wakefulness, Kleitman (99) con- 
cluded that, although the flow of sen- 
sory impulses into the brain is instru- 
mental in initiating and maintaining 
the state of wakefulness, the brain 
mechanisms involved cannot be as 
simple as might first appear. Activity 
of cortical origin must be capable of 
maintaining the waking state in the 
absence of heavy afferent inflow. 
Kleitman called wakefulness main- 
tained by sensory inflow “wakeful- 
ness of necessity,” and that main- 
tained by cortical activity “wakeful- 
ness of choice.” On the basis of the 
pathological and experimental data 
at his disposal, he concluded that 
both sensory-afferent and cortical 
influences must work through a 
“wakefulness center” in the brain, 
which is probably located in the hy- 
pothalamus and perhaps extends 
into the thalamus and midbrain, and 
further, that as long as the wakeful- 
ness center is active it maintains the 
waking state by its influence on the 
brain as a whole. He recently reit- 
erated these conclusions (100). 
Experiments by Magoun, Linds- 
ley, and their colleagues indicate that 
the brain stem reticular formation 
(BSRF) fulfills the requirements ofa 
wakefulness center as postulated by 
Kleitman. The BSRF is located 
ventromedially in the brain stem an 
—extends-from- the medulla through 
i. the pòasp midbrain, sub- and hypo- 
inhi COLLEGE 1 
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thalamus, and into the ventromedial 
thalamus. Thus it occupies the ap- 
proximate position predicted by 
Kleitman to contain the wakefulness 
center, but is somewhat more exten- 
sive than he predicted (133). 

In some of the experiments of 
Magoun et al. observations of EEG? 
changes were substituted for observa- 
tions of behavioral changes because 
of the close relationship between par- 
ticular EEG patterns and the vari- 
ous stages of the sleep-wakefulness 
cycle—alertness, relaxation, drowsi- 
ness, and light and deep sleep (121). 
The general scheme and sequence of 
patterns is very similar among the 
higher vertebrate species. The EEG 
of an animal drowsing or asleep is 
“hypersynchronized,” that is, char- 
acterized by the presence of distinct 
slow waves (slower than 8 cps in 
man) and frequently also rhythmic 
faster activity (sleep “spindles’’). 
When the animal is awake, but re- 
laxed, the EEG is dominated by a 
rhythm of intermediate frequency 
(8-12 cps in adult humans, the well- 
known alpha thythm). When the 
animal is alert the EEG is desynchro- 
nized, that is, rhythmic activity tends 
to disappear and is replaced by low- 
voltage fast (beta) activity. Magoun 
and his associates have used the term 
activation pattern to describe the 
desynchronized EEG and have called 
behavioral arousal or alertness be- 
havioral activation (126, 127). These 


terms will be used in the following 
discussion, 


Direct electrical stimul 
BSRF in relaxed or so 
results in activation of. 
and behavior (141), 


ation of the 
mnolent cats 
both the EEG 
On the other 


? For simplicity’s sake the ter. 
be used in this Paper to refer to 
of brain-electrical activity, 
or transient, and regardless 
the recording electrodes, 


m EEG will 
all recordings 
whether rhythmic 
of the location of 


hand, progressively more rostrad sec- 
tioning of the BSRF results in the 
progressively more prominent ap- 
pearance in the EEG of hypersyn- 
chronous activity, very similar to 
that seen in normal sleep, with cor- 
responding behavioral somnolence 
(126). After bilateral lesions of the 
hypothalamus, through which the 
ascending neuronal relays funnel 
(175), hypersynchronous EEG ac- 
tivity is persistent and can be re- 
corded both from the cortex and the 
intralaminar nuclei of the thalamus. 
This effect of hypothalamic lesion can 
be demonstrated unilaterally: if the 
lesion is confined to one side of the 
hypothalamus, hypersynchronous ac- 
tivity persists only over the ipsilat- 
eral hemisphere. Chronic cats with 
bilateral hypothalamic lesions, or 
with rostral-midbrain section of 
BSRF, remain somnolent as long as 
they survive, and hypersynchronous 
activity persists in their EEG’s (127). 
If some of the BSRF is spared how- 
ever, there is a tendency for EEG 
patterns to return to normal with 
time (107). 

It is clear from these results that 
the BSRF is essential to the mainte- 
nance of the waking state under nor- 
mal conditions. The next problem 
is that of the relationship of reticular 
activity to the afferent sensory ac- 
tivity which is capable of effecting 
the arousal of a somnolent animal 
under normal conditions, 

If sensory end organs or peripheral 
nerves are stimulated, specific elec- 
trical responses can be recorded in 
the BSRF but not in adjacent struc- 
tures other than the classical sensory 
pathways and relay nuclei (176). 
This indicates the existence of col- 
lateral innervation of the BSRF from 
the long sensory afferent tracts (the 
medial and lateral lemnisci, etc.) 
as they pass the reticular formation 
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in the brain stem, especially in the 
region of the medulla, pons, and mid- 
brain. 

After transection of the BSRF, 
strong sensory stimulation results 
in transient behavioral and EEG ac- 
tivation, but such activation out- 
lasts the stimulus by only a few sec- 
onds (126). Therefore, sensory im- 
pulses arriving in the forebrain (thal- 
amus and cortex) can initiate the 
state of arousal, but arousal cannot 
be maintained in the absence of the 
BSRF. 

Stimulation of animals with intact 
BSRF’s, but with interruption of all 
sensory tracts in the midbrain rostrad 
of the entry of collaterals to the 
BSRF, results in prolonged EEG and 
behavioral activation, even though 
no arriving impulses can be detected 
in the appropriate sensory area of 
the cortex and specific behavioral 
responses to the stimuli do not occur 
ee. Animals in a chronic condi- 
sudh as that just described con- 
ite to exhibit sleep-wakefulness 
ig both behaviorally and EEG- 
sis A a Thus it appears that the 
ae „of sensory impulses at the 
tion x is not essential to the initia- 
ae maintenance of wakefulness 
ok 9 the sleep-wakefulness cycle as 
tis as tie BSRF and its corticipetal 
lsc are intact and as long as 
as th y pathways are patent as far 

e level of the BSRF. 

* ae experiments described thus 
lai ie performed on cats. Some of 
With ee been repeated on monkeys 

rs same results (53, 54), except 
lite d e latter do not survive com- 

ie A sr of the cephalic por- 

a ik he brain stem. But in mon- 
in incomplete lesions resulted 
eee Pee somnolence and 
Was obge EG hypersynchrony than 

l rved in cats. Unless much 


oft 
ne ic 
reticular substance was spared, 


even intense peripheral stimulation 
failed to elicit either behavioral or 
EEG activation. In man clinical 
observations suggest that destruc- 
tion of the reticular system of the 
brain stem or its isolation from higher 
functional areas results in prolonged 
somnolence, just as do similar ex- 
perimental lesions in cat and monkey 
(51, 88, 99, 179). 

Taken together these findings indi- 
cate that a background of main- 
tained activity in the BSRF accounts 
for the maintenance of wakefulness, 
while reduction of its activity precipi- 
tates a state of somnolence or un- 
consciousness. The BSRF mediates 
the activation of the forebrain which 
is associated with alertness following 
sensory stimulation. It is a moot 
question whether consciousness as a 
psychological state, in man at least, 
is possible without the cortex. It is 
likely that the BSRF constitutes a 
wakefulness center because of its 
influence upon the activity of the cor- 
tex. Further, in accordance with 
Kleitman’s hypothesis it would be 
expected that impulses from the cor- 
tex, as well as those from peripheral 
receptors, can excite the reticular 
formation, resulting in what Kleit- 
man calls “wakefulness of choice.” 
This has recently been demonstrated 
by French eż al. (52) in the monkey, 
where it was shown that stimulation 
of certain cortical areas produced 
electrical responses in the BSRF. 

The role of the thalamus in sleep 
has also received attention. During 
sleep, bursts of mixed fast and slow 
electrical waves can be recorded al- 
most simultaneously from the thala- 
mus and cortex (126). While a defi- 
nite interrelationship between suc 
cortical and thalamic activity ap- 
pears to obtain under normal condi- 
tions, burst activity in those struc- 
tures is in reality at least semi-inde- 
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pendent (44, 82, 91, 105, 113), and 
whether the thalamus dominates 
the cortex during sleep is an open 
question. Some indication that it 
may do so is found in reports of the 
experimental induction of sleep by 
electrical stimulation of certain tha- 
lamic nuclei (3, 75, 76, 136). 

The picture which emerges then 
is this: In the sleeping animal large 
areas of the cerebral cortex are in- 
undated by great waves of electrical 
activity, thousands of neurons beat- 
ing in synchrony. A diffusely project- 
ing thalamic system participates in 
this activity. Afferent sensory im- 
pulses can reach the primary cortex, 
but their spread appears to be 
blocked. The BSRF is relatively in- 
active, and apparently relatively in- 
sensitive to afferent sensory impulses. 
When the flow of afferent impulses 
(number of impulses per unit time) 
becomes supraliminal, or when the 
excitability threshold of the BSRF is 
reduced, as must occur for example at 
the end of a night's sleep, the BSRF 
is stimulated to activity and initiates 
a flow of impulses into the forebrain. 
The hypersynchronous electrical ac- 
tivity of the forebrain is thereby ar- 
rested and is replaced by low-voltage 
fast activity and alpha waves, as is 
seen in the EEG. Sensory impulses 
are again received, transmitted, and 
integrated in the cortex. The animal 
is awake and alert. 

This formulation leaves some ma- 
jor questions unanswered—such as 
what factors are responsible for the 
regular, but alterable, cyclic varia- 
tion of such processes in time—but 
the central nervous mechanisms of 
the cycle have been greatly elucidated 
by these studies. 


SENSATION 


The onset, and often the “offset,” 
of a stimulus may be followed by one 
r both of two classes of cortical re- 


sponses: (a) transient evoked poten- 
tials, and (b) changes in the back- 
ground activity pattern. 


Evoked Potentials in Man 


Recording evoked cortical poten- 
tial changes from the scalp is not as 
satisfactory as recording them from 
the exposed cortex, but in man it is 
the only available technique, except 
at the time of exploratory brain sur- 
gery. Such recording from the scalp 
has however been repeatedly demon- 
strated, not only using light flashes 
as stimuli (64, 137, 157), but also 
using somesthetic (39) and auditory 
(121) stimuli and following direct 
electrical stimulation of peripheral 
nerve (38, 40, 115). Cathode ray 
oscillographs are usually necessary 
to demonstrate such phenomena (38, 
137), but some evoked potentials can 
be recorded using conventional EEG 
amplifying systems (28, 48, 49, 55, 
158). 

The method can be used in deter- 
mining afferent transmission time in 
reaction-time experiments (137); and 
interaction of cerebral responses to 
visual and auditory stimuli has been 
demonstrated (181). The latter may 
be part of the physiological substrate 
of sensory-interaction phenomena. 


The Effects of Repetitive Stimulation 


Twenty years ago Adrian and Mat- 
thews reported that repetitive visual 
stimulation (“photic flicker") was 
capable of “driving” brain rhythms 
at the frequency of the stimulus. Sev- 
eral subsequent investigators con- 
firmed and extended these observa- 
tions (121). It has been reported that 
intermittent auditory stimulation too 
can “drive” brain rhythms, but not 
nearly as readily as visual (55, 61). 
j There has been a recent revival of 
intense interest in the use of photic 
flicker stimulation, stemming largely 


from the work of W. Grey Walter 
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(203). As well as being an experi- 
mental instrument of interest in neu- 
rophysiology and neuropsychology, 
flicker stimulation has proved useful 
in epileptology, (57, 96, 153, 156), 
but we will not be concerned with 
this application. 


Stimulation Techniques 


. Early work was done using a steady 
light source interrupted by a rotating 
sectored disk. The frequency of 
stimulation was varied by varying 
the rate of rotation of the disk. More 
recently all-electronic stimulators 
have come into general use. The 
light source (stroboscope) is placed 
a few cm. from the eyes so as to 
stimulate all areas of the retina uni- 
formly. Typical stimulus values used 
vs EEG work are blue-white light, 
8,000 candles, flash duration 15-50 
ens (23, 56, 145, 199), with 
TA continuously variable from 
ir ahes per second (fps). The 
TA rief flash duration, which is 
aye pn regardless of frequency, per- 
ee he use of light of intensities 
hich would injure the retina at 
enil er light-dark ratios (191). Some 
ta are more sensitive to red 
ie, E han to that of shorter wave 
R s or to white light (23, 191). 
tee purposes two or more stim- 
Sai rs have been used simultane- 

Sly (164, 199, 202). 


Recording Techniques 


fo anilard EEG equipment is used 
ees cerebral activity during 
eee One channel is used to 
detect i stimulus, which is usually 
Two by a photoelectric cell. 
Horn recently developed accessory 
stimulate devices make the flicker- 
Downe technique an even more 
Omane experimental tool: the au- 
O A ink-writing frequency an- 
from ( 03, 199, 201), for the data 
which Gleser has recently de- 


veloped a method of statistical treat- 
ment (60); and the toposcope (29, 
30, 204, 206), by means of which the 
electrical activity of the brain is dis- 
played on a battery of cathode-ray 
tubes. 


Physiological Effects 

The simplest brain electrical re- 
sponse to repetitive light stimulation 
of the retina is the occurrence in 
neural elements subserving visual 
functions, or in related structures, of 
a series of evoked potential fluctua- 
tions at the frequency of the stimulus. 

The upper and lower frequency 
limits to which neural elements will 
follow a repeated stimulus are de- 
pendent upon a number of factors. 
One variable is neural level. Linds- 
ley (125) has observed following up 
to 100 per second in the retina, optic 
tract, and lateral geniculate of the 
cat, but only to 40 or 50 per second 
in the visual cortex. Walker et al. 
(194) observed following to 62 per 
second in the optic nerve, 59 per sec- 
ond in the lateral geniculate, and only 
to 34 per second in the visual cortex 
of the macaque. 

Gastaut (56) recorded both from 
the subcortical white matter of the 
occipital lobe (through burr holes) 
and from the scalp in 9 “normal” 
human subjects. He frequently ob- 
served following in the optic radia- 
tions when none could be detected 
in the scalp tracings. 

In most work with human sub- 
jects it is feasible to obtain only re- 
cordings from the scalp. Under such 
conditions the brain rhythm follows 
the stimulus down to about 3 per 
second and up to about 25 per second 
(145). Evoked responses are maxima 
with flicker in the alpha frequency 
band. Following above 25-35 per 
second is rare (56, 145). Following 
at low frequencies is more easily 
evoked in infants (124, 199) and 
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children (55, 199), and at high fre- 
quencies in old people (145, 199). 

In addition to the evocation of 
brain waves at the frequency of the 
stimulus (fundamental frequency), 
cortical potential fluctuations are 
often observed at harmonic or sub- 
harmonic frequencies (17, 56, 145, 
199, 202). For example, with a stimu- 
lus frequency of 12 fps, evoked po- 
tentials may occur at 24 per second 
or at 6 per second. Third harmonics 
and subharmonics are also seen. 
Evoked responses do not, however, 
always occur exactly at multiples or 
submultiples of the flash frequency, 
but may deviate by plus or minus a 
few cps (199). 

The topographic distribution of 
harmonic and subharmonic responses 
in the brain is often more widespread 
than that of responses at the funda- 
mental frequency, which are usually 
maximal in the occipital-parietal re- 
gion (145). Third harmonics are 
seen significantly more often among 
old people than among young adults, 
second subharmonics significantly 
less often. 

Walter and Walter reported that 
the regularity and constancy of re- 
sponses tend to increase during the 
first few minutes of stimulation, while 
anomalous responses tend to subside, 
and cite these observations as evi- 
dence of long-term recruitment and 
facilitation, and of extinction or 
adaptation, respectively (199) 


Sensory Effects 


In addition to the primary sensa- 
tion of visual flicker, a number of 
illusory effects, both in the visual (13) 
and other modalities, have been 
noted. These are of considerable in- 
terest, for the revelation of the mech- 
anisms of such illusory effects may 
yield information about the normal 
functioning of the nervous system. 


First the larger topic of visual illu- 
sions. 

Fusion. The best known illusion 
is that of fusion: above a certain 
flash frequency a flickering light ap- 
pears subjectively as a steady light. 
The lowest flash frequency at which 
this effect appears is called the criti- 
cal fusion frequency (cff), which 
varies with a number of factors (13). 

A point of controversy is whether 
the mechanism of fusion is central or 
peripheral (retinal) (47). Halstead 
and his colleagues (67, 194, 195) sug- 
gest that, since evoked rhythmic re- 
sponses in the visual cortex break 
down at about the cff, whereas those 
of subcortical and peripheral struc- 
tures follow the stimulus to higher 
frequencies, the fusion mechanism is 
a cortical one. Knox has reported 
that the subject's attitude can affect 
the cff (109), and that simultaneous 
auditory “flicker” enhances the ‘‘pro- 
nouncedness”’ of visual flicker (111); 
these effects must be central. On the 
other hand he was unable to demon- 
strate an effect of auditory “flicker” 
on the cff itself (110). 

Brightness enhancement. At high 
flash frequencies, the subjective 
brightness of a repetitive light stimu- 
lus is less than that of a steady light 
of the same intensity, but at about 10 
fps it appears brighter—the well- 
known Bartley effect. This rate, 10 
per second, coincides with the fre- 
quency of the alpha rhythm. It is 
also the frequency at which maximal 
evoked rhythmic responses of the cor- 
tex are obtained both with light 
flashes to the retina and with direct 
electrical pulses to the optic nerve 
(13). Bartley concludes that “a ma- 
jor component of the optic-cortex re- 
sponse to light involves the same cor: 
tical elements as the alpha activity.” 

Synesthesia. In addition to evoking 
such previously reported visual illu- 
sions as pattern, color, and move- 
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ment, which have been abundantly 
confirmed in recent EEG studies 
(17, 55, 145, 188, 197, 199), visual 
flicker stimulation sometimes evokes 
illusory sensations in other modali- 
ties. _Kinesthetic sensations, such as 
swaying, swinging, spinning, and roll- 
ing occur fairly frequently during 
flicker stimulation (57, 145, 199), 
visceral sensations occasionally. Cu- 
taneous sensations (tingling, prick- 
ling) are also reported (199). Audi- 
tory, gustatory, and olfactory sensa- 
ey are rare, but do occur (145, 
9). Synesthetic effects are report- 
edly associated with strong evoked 
cortical responses in the region of the 
appropriate sensory area (199). 
aoe (199) attributes anomalous, 
a isual effects to interaction be- 
ween rhythmic evoked responses 
ar acon related spontane- 
a. hythms in other circuits at the 
Sap level. Visual subjective ef- 
ee attributes to interference be- 
oe hee activity and spontane- 
oe hythms at the cortical and pos- 
y also thalamic levels. 
i to Walter and Walter, 
tinsel the illusory effects of flicker 
a z ation and the evoked cerebral 
A may be accentuated or 
actie hed by the mood or mental 
renio y of the subject. If the subject 
is a the illusory effects from 
on nory or imagination, the corti- 
a e responses are aug- 
tects if he inhibits the illusory 
a the electrical responses are 
ished (199). 
ra on to visual illusions and 
a esias, outright hallucinations, 
See of pleasantness and unpleas- 
bras, 3S and anxiety states may be 
oked in some subjects. 


Background Activity Changes 


C i es 
of “ranges in background activity 
Sites e cerebral cortex following the 

t of a stimulus depend upon the 


activity pattern present at the time 
of stimulation (114), which in turn 
is related to the condition of the sub- 
ject. 

The best known of these effects is 
blocking of the alpha rhythm (121). 
The alpha rhythm, which is most 
prominent when the subject is re- 
laxed and unstimulated, is said to 
block when it is noticeably reduced 
in amplitude. Although a stimulus 
of any modality can block the alpha 
rhythm—blocking by olfactory stim- 
ulation being the most recently dem- 
onstrated (4)—visual stimuli are 
clearly the most effective. This, to- 
gether with the fact that the alpha 
rhythm is of highest voltage in the 
occipital region, has been taken to 
imply that the alpha rhythm is more 
closely related to visual processes 
than to other sensory processes (1). 

‘After initially blocking at the onset 
of a continuous stimulus, the alpha 
rhythm may reappear (especially if 
the stimulus is monotonous OF not 
meaningful) and then block again at 
the “offset” of the stimulus. These 
phenomena could be related to the 
presence of “on-off” and “off,” as 
well as continuously firing, elements 
in the optic nerve. Further, the alpha 
rhythm may block if the subject 
merely imagines a stimulus, or if he 
concentrates on an abstract prob- 
lem. Alpha blocking has thus been 
interpreted as related to attention to 
stimuli rather than to stimulation per 


se. p 

When low-voltage fast activity 
rather than the alpha rhythm pre- 
dominates in the EEG, as when the 
alert, the onset or offset of 
a stimulus is followed by no change in 
the background activity. On the 
other hand, if the animal is drowsy 
or in light sleep, the onset of a stimu- 
lus may be followed by the appear- 
ance of the alpha rhythm, Continu- 
ance of the stimulus (especially if it 


animal is 


| 
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is not of great intensity and is monot- 
onous) may be followed by the redis- 
appearance of the alpha rhythm and 
the reappearance of the patterns of 
drowsiness or sleep. If however the 
stimulus is attention-getting, and the 
animal is fully aroused, the low- 
voltage fast pattern typical of the 
state of alertness will replace the 
alpha rhythm, In deep sleep a stimu- 
lus produces no change in the EEG 
unless it is of extraordinary intensity. 


A Visual Attention Hypothesis 


Among the perennial questions in 
electroencephalography are those 
concerning the nature and functions 
of the “spontaneous” brain rhythms. 
The alpha rhythm has received most 
attention, and there will be occasion 
in this review to mention several of 
the hypotheses which have been ad- 
vanced concerning its function. 

In 1943 Adrian (1) suggested that, 
since the alpha rhythm appears to be 
more closely associated with vision 
than with other functions, it might 
be an alternative form of excitation 
which competes with afferent visual 
impulses for control of the cortex, and 
when it is in control, “enables us to 
divert our attention from visual stim- 
uli and yet to keep the unoccupied 
brain from falling asleep. Auditory 
and tactile stimuli can do this, but 
vision has a far more important ef- 
fect on the levels of activity of the 
brain. Certainly the a rhythm is the 
characteristic product of the brain 
of one who is not seeing but is still 
awake.” The many people who do 
not show the alpha rhythm at all 
“may be people who cannot with- 
draw some degree of attention to the 
visual field as long as they are 
awake.” 

An early finding of Jasper and 
Cruickshank (85) appears to support 

Adrian’s hypothesis. They observed 
an inverse relationship between the 


strength of the alpha rhythm and the 
strength of visual afterimages, the 
afterimages waxing when the rhythm 
waned, and vice versa. 

Walsh (197) undertook an experi- 
mental test of Adrian’s hypothesis. 
He exposed subjects, reclining in 
darkness and fixating a small red 
light, to 1-10 luminous dots pre- 
sented tachistoscopically. Ability to 
recognize the patterns did not appear 
to depend on the amplitude of the 
alpha rhythm. He then exposed his 
subjects alternately to trains of clicks 
and flashes. The amplitude of the 
alpha rhythm when the subject was 
counting clicks did not differ signifi- 
cantly from the amplitude when he 
was counting flashes. Walsh con- 
cluded that his results “do not sup; 
port the ‘visual attention’ theory.’ 


An Hypothesis of the Alpha Rhythm 
as a Sensory Timing Mechanism 
The alpha rhythm as recorded 

from the scalp consists of synchro- 

nized potential fluctuations in par- 
ticular aggregates of cortical neurons. 

These potential fluctuations probably 

“represent synchronized oscillations 

in membrane potentials, possibly in- 

volving interneurons and dendrites 
in the cortical matrix, oscillations 
which would have a definite effect 
upon neuronal excitability, but not 
dependent upon neuronal discharge” 

(119). The validity of the excita- 

bility-cycle assumption is supported 

inferentially by experimental evi- 

dence of Bartley and Bishop (14), 

Bishop (18), and Bartley (12), and 

more directly by that of Chang (25- 

27), Gastaut et al. (58), and Morin 

et al. (138). It is not however cer- 

tain that the cortical excitability 
cycles demonstrated in the last-men- 
tioned studies are directly related to 
the alpha rhythm. 

Assuming, for the sake of argu- 
ment, that the alpha cycle actually 
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does represent an excitability cycle. 
it has been proposed as a means of 
coding or timing sensory impulses, 
‘in order that our perceptual world 
and our reactions to it are not dis- 
torted or smeared by the more or less 
continuous influx of sensory stimuli” 
(124). The proposed mechanism is 
relatively simple. The probability 
that incoming impulses will cause a 
neuron to fire will vary with the phase 
of the excitability (alpha) cycle. Im- 
pulses arriving at synapses when the 
trans-synaptic neuron is in the phase 
of increased excitability will be more 
likely to fire the trans-synaptic 
neuron, Those arriving during the 
phase of lowered excitability will be 
less likely to do so. And when the 
excitability cycles of a group of neu- 
Tons are synchronized, then the flow 
of impulses through that group will 
be timed by the frequency and phase 
of the cycle. 

Jung and his colleagues (89) have 
conducted microelectrode studies of 
the discharges of individual cortical 
Neurons, They distinguished three 
kinds of neuronal discharges, classi- 
fied according to their relation to nor- 
mal brain rhythms: (a) spike dis- 
charges entirely independent of nor- 
mal cortical rhythms (such as alpha), 
(b) discharges associated with certain 
Phases of the spontaneous rhythms, 
and (c) rapid bursts of discharges as- 
sociated with the surface-positive 
evoked potentials in sensory areas 
following stimulation. The vast 
pee of all cortical neurons be- 
Ong to class a. Neurons of class b, 
ee behave in a way predicted by 
r e timing hypothesis, were much less 
requently found in these experi- 
ments. 
ga major objection to the identifi- 
& ton of the alpha cycle as a timing 
ocPenism is that the alpha rhythm 
i ocks during the reception of stimuli 
© which the animal is paying atten- 


tion. This objection is met by draw- 
ing a distinction between the alpha 
rhythm and alpha activity (124). 
The alpha-activity cycle is presumed 
to be a basic mechanism of the indi- 
vidual brain cell, but it is only when 
thousands of cells are acting in syn- 
chrony that there is sufficient voltage 
summation to produce a recordable 
alpha rhythm. A low-voltage fast 
EEG, as when the subject is alert, 
may represent a “fractional syn- 
chronization” in smaller aggregates 
of cells, but with random phase rela- 
tions. If this is true, alpha activity 
may exist in smaller aggregates of 
cells in the absence of a recordable 
alpha rhythm in the standard EEG. 
For a more detailed discussion, see 
Lindsley’s original paper (124). 


RESPONSE PROCESSES 


Beta Blocking and Voluntary Move- 
ment 


Jasper and Penfield (86) have ob- 
served that the beta (low-voltage 18- 
30 cps) activity of the precentral 
gyrus blocks at initiation of a volun- 
tary movement (such as fist clench- 
ing), adapts, and reblocks at cessa- 
tion of the movement. Continuous 
varied movements are accompanied 
by prolonged blocking. Imagining a 
movement is not followed by beta 
blocking, but readying-to-move 1S. 
Beta blocking may be quite circum- 
scribed to the area of representation 
of the part moved. 


Alpha Blocking and Motor Reaction 

Time 

Simple motor reaction times to 
light have been found to vary from 
about 150 to 500 msec. (173). Since 
both the alpha blocking response an 
motor reaction may be initiated by 
the same stimulus and since their 
latencies are of the same order, it 
would seem reasonable that they are 
somehow related. Several studies 
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have had a bearing on this question 
(5, 65, 102, 173, 183, 185). The 
earlier studies, cited by Lindsley 
(121), indicated that latency of alpha 
blocking is reduced slightly by an 
associated motor response. However, 
correlations between alpha blocking 
times and motor reaction times, rang- 
ing from .37+.09 (85) to —.19+.05 
(102), did not clearly indicate a sig- 
nificant relationship between the two 
variables. Further, alpha blocking 
times are sometimes longer and some- 
times shorter than motor reaction 
times. 

The most recent study in this area 
is that of Stamm (173). He found 
that mean motor reaction times de- 
creased systematically and signifi- 
cantly under certain experimental 
conditions, but alpha blocking times 
(recorded concurrently) did not. The 
mean correlations between the two 
measures varied between .300 and 
+323 (o's .152-.228) for 20 subjects. 
Under every condition there were 
subjects who responded at least part 
of the time with alpha blocking times 
longer than reaction times, Stamm 
concluded that motor reaction times 
and alpha blocking times are meas- 
ures of essentially independent sys- 
tems. 


Alpha Activity as a Response Timing 
Mechanism 


The hypothesis that alpha activity 
may serve as a means of timing sen- 
sory impulses entering the brain has 
already been presented. It has also 
been proposed that it may serve as a 
means of timing motor impulses leay- 
ing the brain and that these functions 
may be coordinate. 

Kibbler and his co-workers (97, 
98) concurrently recorded the EEG 
and various voluntary muscular re- 
sponses to auditory stimuli. Plotting 
responses against phase of the alpha 

rhythm, they found that the proba- 


bility of a response was not randomly 
distributed in time, but rather 
showed peaks and troughs recurring 
about 10 times a second in accurate 
phase relationship with the alpha 
rhythm. Bates (15) similarly found 
a significant tendency for voluntary 
movements to be initiated always at 
the same point in the phase of the 
alpha wave. 

Meister (135) concurrently re- 
corded EEG and saccadic eye move- 
ments in four subjects. Although 
readable tracings were difficult to 
obtain because of alpha blockade by 
visual stimulation, his data also indi- 
cated a significant relationship be- 
between the inception of eye move- 
ments and the phase of occipital- 
parietal alpha waves. 

The results of these studies sup- 
port the hypothesis that the alpha 
rhythm (or alpha activity) may be a 
means of timing the outflow of motor 
impulses from the brain. 


PERCEPTION 


Devising theoretical models of 
physical processes corresponding to 
mental processes is useful to the psy- 
chophysiologist only insofar as such 
models predict how the brain itself 
actually operates, or give rise to con- 
structive suggestions for future work. 
MacKay has offered an excellent dis- 
cussion of the usefulness of brain- 
machine analogies (132). 

Several theoretical neural models 
have been devised which offer ex- 
planations of perceptual phenomena. 
Only those which involve the alpha 
rhythm will be discussed here. 


A Scanning Hypothesis 


Grey Walter has postulated the 
need for a mechanism in the space 
receptor areas of the brain, “whereby 
the sensory field could be scanned 
continuously in such a way that the 
detailed bits of information they con- 
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tain could be conveyed to a central 
assembly by only a few channels, 
there to be related one with another 
for appropriate action to be taken” 
(202). One characteristic of such a 
scanning mechanism would be that 
on the instant that a signal appeared 
on its beat it would be halted, and 
its position at the check would con- 
vey to all other regions the relative 
position of the detail of sensation. 
When a complex pattern appeared, 
the succession of runs and checks 
would repeatedly convert a special 
pattern which was constant during 
the time of a single sweep . . . into a 
series of signals on a base of time, so 
that all the information contained in 
a single parameter of sense can be 
conveyed on a single channel in a 
code of unit pulses.” 

The price of such parsimony is that 
reception speed is limited by the 
Sweep frequency; changes in space 
which occur more frequently or 
movements which cover a great dis- 
ae in the time of the sweep will re- 
addict smearing of the pattern. In 
i = hae intermittent signals which 
shorter than the duration of the 
Sweep but recur at about its fre- 
quency will give the illusion of move- 
i. ++. (A)nd also, if action is 
tion, taken on the basis of informa- 

received from the field scanner 
S . the outflow of centrifugal mes- 
B must be to some extent regu- 
ti by and synchronized with the 
me base on which the centripetal 
Ones are coded.” 
i alter goes on to propose that the 
ad a rhythm constitutes such a 
ately ne mechanism. It is immedi- 
ne | apparent that this hypothesis 
i unts for some of the observe 
Phenomena already described in this 
ae alpha blocking at the onset of 

‘mulation, the movement illusion 
SEn flicker stimulation, and the 

nchrony of the alpha rhythm with 


the initiation of voluntary move- 
ments. The hypothesis can also ac- 
count for the upper time limit for per- 
ception of a pattern (1/10 second) 
and for the phenomena of apparent 
movement. Walter has reported, as 
would be expected from the hypothe- 
sis, that in persons who employ 
“yiyid and plastic visual images” for 
mental tasks the alpha rhythm is 
discontinuous and frequency analy- 
ses of the alpha rhythm are complex 
and variable (203). 

Pitts and McCulloch independ- 
ently conceived of a scanning-mecha- 
nism role for the alpha rhythm (155). 
They showed that a circuit resem- 
ling a nerve net could be drawn, 
which could extract from a pattern of 
stimuli certain invariant character- 
istics (for example, in vision shape 
regardless of size or in audition 
chordal structure regardless of pitch), 
“if it were rhythmically excited to 
‘scan through’ the group of trans- 
formations with respect to which that 
characteristic was invariant.” Mc- 
Culloch and Pitts’s nerve-net mod- 
els? were arranged in layers (as in 
the cerebral cortex) and their hy- 
pothesis required that a rhythmic 
scanning sweep step Up and down 
through the layers. They felt that 
the alpha rhythm satisfied the re- 
Their over-all scanning 
s in a number of respects 
alter’s, and McCulloch 
f the same data in its 


quirement. 
hypothesis i 
similar to W 
cites many 0 


upport. 

i Walsh (197), O'Hare (147), and 
MacKay (131) have experimentally 
tested deductions from the scanning 
hypothesis. Walsh’s deductions and 
results are as follows. (a) Reaction 
time should vary with the amplitude 
or the phase of the alpha rhythm at 


3 For details and related speculations see the 


original paper (155) and McCulloch's Hixon 


Symposium paper and the discussion which 
followed it (130). 
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the moment of stimulation; it did 
not. (b) Visual threshold should also 
so vary; it did not. (c) Reaction time 
scatter should not exceed 0.1 second; 
it did. (d) The distribution of reac- 
tion times should be rectangular; it 
was positively skewed. 

O'Hare repeated part a of Walsh's 
study using a larger number of sub- 
jects and measuring both visual and 
auditory reaction times. He too ob- 
tained negative results. 

MacKay argued that “if size-in- 
variance were in fact secured by a 
rhythmic transformation of the size 
of the mapped visual image in the 
occipital cortex at the alpha-fre- 
quency, then it would be not unrea- 
sonable to look for some kind of 
stroboscopic effect if the subject were 
presented with a visual image that 
was itself fluctuating in size at or 
about the same rate.” He tested this 
deduction using a specially designed 
“pattern-generator.” The predicted 
results were not obtained. 

MacKay concluded that these fail- 
ures do not disprove the scanning hy- 
pothesis, but they “appear to cir- 
cumscribe the role attributable to 


the alpha rhythm in such specula- 
tions.” 


A Movement-Perception H- ‘ypothesis 


When one voluntarily scans a vis- 
ual field, the objects in that field do 
not appear blurred or smeared but 
remain clear and distinct. The same 
is true of an object moving across the 
visual field of a stationary eye. On 
the other hand, when the eye moves 
involuntarily, as when the eyeball is 
pushed with the finger or during 
nystagmoid movements of vestibular 
origin, the visual field appears to 
move and objects appear blurred. 
Because he finds it difficult to ex- 
plain these phenomena in terms of a 
visual system in which data (im- 
pulses) are transmitted continuously 


from the retina to the visual areas 
of the cortex, Meister (135) has pro- 
posed that retinocortical transmis- 
sion is not continuous, but rather, 
intermittent; that there exists a 
“neuronic shutter mechanism” which 
allows visual data to reach the cortex 
in ‘‘discrete units.” The basic mecha- 
nism is that of summation in which 
the nerve elements responsible for the 
alpha rhythm summate with retino- 
cortical impulses at a synaptic trans- 
mission point. The locus of such a 
mechanism may be either subcortical 
(geniculate?) or cortical, or perhaps 
both. 

Meister’s formulation requires that 
voluntary and saccadic eye move- 
ments be so synchronized with the 
shutter mechanism that retinocorti- 
cal transmission occurs only when the 
eye is stationary. (His study of alpha 
phase at the initiation of saccadic 
eye movements, already cited, was 
done in this connection.) 

Meister draws an analogy between 
his neuronic shutter mechanism and 
a motion picture camera which moves 
when the lens shutter is closed and is 
Stationary when the shutter is 
opened. Thus movements of the 
visual field and smearing are avoided. 
He also shows how the hypothesized 
mechanism can explain the phenom- 
ena of apparent movement, gamma 
movement, the spoke illusion, and 
Charpentier's bands, and how dis- 
orders of the system may account 
for such pathological phenomena as 
the quick motion illusion and mo- 
nocular diplopia and polyopia, 

The similarity of this “shutter” 
hypothesis to the timing and scan- 
ning hypotheses is apparent. It is 
interesting to note that these similar 
hypotheses were arrived at inde- 
pendently and that many of the same 
data from the literature have been 
cited in support of each of them. 
Lindsley (124) has cited Meister’s 
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hypothesis as an example of the ap- 
plication of the broader timing hy- 
pothesis. 


COMPLEX PROCESSES 
A Diffuse Thalamic Projection System 


Recent work has revealed the 
existence of a diffuse thalamo-corti- 
cal system, separate from the better 
known specific projection systems, 
and has determined some of its rela- 
tionships with other parts of the 
brain. 

In the early 1940's Morison and 
Dempsey (41, 42, 139) reported re- 
cording electrical waves from wide- 
spread areas of the cortex during low- 
frenienicy repetitive electrical stimu- 
ation in the vicinity of the internal 
eee lamina of the thalamus. 

hese responses exhibited recruit- 
ment, that is, they increased in am- 
ce with the first few stimuli. 
es recruiting potentials were simi- 
q to spontaneous 8-12 cps poten- 
ee, in a number of respects, and 
aaa and Dempsey suggested that 
i “| are identical. Since World War 
ha : work of Morison and Dempsey 
lab een extended, principally in the 
eh oratories of Jasper (2, 68, 69, 79, 

-84) and Magoun (174, 177, 192, 
193), 
aang. For details of the tha- 
Fone structures involved and their 

4 ections, see the papers just cited. 
aia has called the elements of 
in e structures involved in recruit- 

& responses the “thalamic reticular 
System,” 

Physiology. Single electric pulses 
«Clivered to the thalamic reticular 
Tom evoke rhythmic bursts of 8- 

Per second waves in the cortex. 

p cPetitive stimuli at 8-12 per second 
accompanied by augmentation of 
ia eae cortical rhythm. Stimuli 
A per second tend to evoke 
Stil € responses (at 8-10 per second) 
slower stimuli even triple re- 


sponses. Stimuli at 15-20 per second 
tend to evoke responses only at every 
other pulse. Stimuli at 30-50 per 
second suppress the recruiting re- 
sponse (82). Simultaneous stimula- 
tion of the BSRF and the thalamic 
reticular system results in suppres- 
sion of the recruiting response (69). 

If the thalamic reticular system is 
stimulated strongly, it tends to func- 
tion as a unit, evoking recruiting re- 
sponses in all or most of the cortex 
(177). If, however, just-adequate 
stimuli are delivered to discrete, por- 
tions of the system with small bipolar 
electrodes, more discrete projection 
to the cortex can be demonstrated 
(69, 82). 

‘All recruiting phases of the recruit- 
ing response are of local origin, that 
is, recruitment occurs in the thala- 
mus independently of the cortex and 
vice versa (69, 192). Each local re- 
sponse is believed to represent an 
oscillatory phenomenon developing 
along closed chains of neurons. The 
time relations of the responses from 
deep structures and those in the cor- 
tex of the intact brain suggest how- 
ever that the processes are closely re- 
lated. Verzeano et al. (192) have 
hypothesized a complex of reverber- 
ating circuits, consisting of intra- 
cortical and intrathalamic reverber- 
ating circuits plus thalamocortical 
and corticothalamic connections be- 
tween them, to account for the condi- 


tions observed. ; í 
Functions of the diffuse thalamic 

projection system. It seems to be 

agreed among those who have been 


4 Jung and Riechert have recently demon- 
strated recruitment phenomena following 
stimulation in the medial and intralaminar 
thalamic nuclei in man, but they were un- 
able to demonstrate the suppression of cortical 
rhythms by high-frequency (30-100 per sec- 
ond) stimulation (90). Hassler and Riechert 
however were able to produce behavioral 
arousal with high-frequency stimulation of the 


intralaminar nuclei or centre median (71). 
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investigating this system (69, 82, 
83, 84, 177) that, because of its struc- 
ture and its connections and func- 
tional relationships with the cortex 
and deep brain structures, it is ideally 
suited to function as a central inte- 
grative mechanism participating in 
such functions as learning and think- 
ing. Jasper and Ajmone-Marsan (84) 
suggest that it can regulate the ar- 
rival and the elaboration of impulses 
within the sensory receiving area of 
the cortex, and since the action of 
the thalamic portion of the system 
can be limited to one specific system, 
a mechanism for the central control 
of attentive processes is suggested. 
This is a general form of Adrian’s 
visual attention hypothesis (above) 
but one which does not suffer from 
Walsh's negative experimental re- 
sults. Hunter and Jasper (79, 82) and 
Penfield (152) believe that the diffuse 
thalamic system is also involved in 


the genesis of some kinds of epileptic 
seizures. 


Conditioning of the Alpha Blocking 


Conditioning, Learning, and Memory 
Response 


It has been repeatedly demon- 
strated that the alpha blocking re- 
sponse can be conditioned (80, 87, 
106, 116, 128, 140, 143, 144, 184). 
The usual procedure is to use a weak 
sound stimulus as the conditioned 
stimulus (CS), one which by itself 
usually does not elicit alpha blocking. 
A light is the unconditioned stimulus 
(UCS). After only a few pairings of 
CS and UCS, the alpha rhythm will 
block to the CS alone, constituting 
the conditioned response (CR). The 
criterion of conditioning js usually 
2-5 CR’s without UCS. Extinction 
is relatively rapid (87, 128, 184). 
Jasper and Shagass (87) estab- 
lished CR’s in 20 trials with 10 sec- 
onds delay between CS and UCS; de- 
layed CR’s were always anticipatory, 


The longer the delay the more trials 
were needed to condition. Delayed 
conditioning was also reported by 
Morrell and Ross (140) and Iwama 
(80). Trace conditioning has been 
established with 4 seconds delay 
(106) and 9-10 seconds delay (87). 
Differential conditioning (80, 87, 
140, 144), and cyclic and backward 
conditioning (87) have also been 
demonstrated, but backward condi- 
tioning required 100 trials to a cri- 
terion of 2 consecutive CR’s without 
UCS. 

Shagass (165) conditioned the 
alpha blocking response to a volun- 
tary CS (fist clenching) in 7 of 8 sub- 
jects. The number of trials to a cri- 
terion of 5 CR’s without UCS varied 
from 8 to 138. Shagass and Johnson 
(167) showed that the acquisition 
curve for such a CR, using a pro- 
cedure in which half of the trials were 
reinforced, was an accelerated one, 
nearly the mirror image of the extinc- 
tion curve. Acquisition and extinc- 
tion curves were similar to those ob- 
tained when peripheral responses are 
conditioned. i 
Laufberger (116) reported condi- 
tioning the alpha blocking response 
using imaginary CS and UCS in 
about 100 trials. The CS consisted 
of thinking of the nonsense syllable 
“ki,” the UCS of thinking of a light. 
Extinction was not mentioned. 

Motokawa and Huzimori (144) 
recorded the EEG during the acquisi- 
tion of a conditioned galvanic skin 
response (GSR), using a bell as CS 
and faradic shock as UCS. They 
distinguished 3 EEG responses which 
they termed “excitation potentials” 
(“Ep”): (a) alpha blocking, (b) beta 
augmentation, and (c) irregular base- 
line deflections. The last may have 
been artifacts, but the authors be- 
lieved not. The first two are identi- 
fible with Magoun’s activation pat- 
tern. 


Motokawa and Huzimori observed 
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that the “Ep” response developed 
during the acquisition of the condi- 
tioned GSR, occuring before the GSR 
itself; it was more easily established 
as a CR and resisted extinction 
longer. During extinction the basic 
alpha rhythm appeared stronger than 
before or during conditioning. Spon- 
taneous disinhibition during extinc- 
tion was associated with an excita- 
tion pattern in the EEG. During 
the delay period in delayed condi- 
tioning the alpha rhythm was un- 
usually strong; the “Ep” response 
occurred about 2 seconds before the 
a ge CR, and was protracted 
through the period of the peripheral 
response. The authors proposed that 
the peripheral CR is secondary to a 
central CR of which the excitatory 
response of the cerebral cortex is a 
component, while the presence of a 
ae alpha rhythm is indicative on 
he other hand of a cortical inhibitory 
state. 7 
Further data pertaining to these 
io tions have been reported. 
a (143) found that it is 
ont. to establish differential condi- 
a ing of the “Ep” response than 
fr Peral responses. Iwama (80), 
esabi in Motokawa’s laboratory, 
R ished a delayed alpha blocking 
dae metronome as CS, witha 
eid Period of 20 seconds. At the 
tuda 7 the metronome the ampli- 
ani E the alpha rhythm increased 
piesi about the 20th second was re- 
Pieter by beta activity. He inter- 
= ie the initial alpha augmenta- 
Gon an indication of internal inhi- 
wi a Extraneous stimuli presented 
co i the delay period resulted in a 
bime aig train of beta waves (dis- 
Hie oi). After differentiation or 
te a were established it was 
tan that alpha waves were well- 
Nt in the whole extent of the 
tiy ogram | (differential or extinc- 
€ inhibition). 

Orrell and Ross (140) condi- 


tioned the alpha blocking response to 
a buzzer (CS). The UCS was a flick- 
ering light. Reaction times to the 
light stimulus were obtained concur- 
rently with the EEG record, thereby 
providing a measurement of cortical 
conduction time each time the CR 
was reinforced. When the light stim- 
ulus was presented after the CR was 
extinguished, the reaction time was 
found to be prolonged (up to 800 
msec. ; pre-extinction time 200-300 
msec.). Differential inhibition 
(sounding a whistle, the CR to which 
has been extinguished, simultane- 
ously with the buzzer) and delayed 
inhibition (conditioning alpha to 
block after a delay of 6 seconds and 
then presenting the light in the inter- 
val) also lengthened the reaction 
time. „The authors assumed that 
retinocortical time and corticomotor 
time are constant and so the increase 
in reaction time is due to an increase 
in intracortical transmission time. 
Therefore they interpret their results 
as confirmation of the Pavlovian 
hypothesis that extinction of the CR 
gives rise to a process causing an in- 
crease in cortical transmission time. 


Alpha Blocking and Stimulus Trace 


Knott et al. (108) sought to deter- 
mine if the blocking of the alpha 
rhythm during and following a brief 
stimulus might serve as a measure of 
central neural activity which could 
be correlated with stimulus trace as 
postulated by Hull. They measured 
the total peak-trough amplitude ol 
alpha waves in mm. for a number of 
short intervals during the presenta- 
tion of a 3-second auditory stimulus 
and also following a 0.2-second stimu- 
lus. The total amplitude obtained 
was divided by the mean prestimulus 
amplitude, yielding an amplitude 
ratio. The shape and duration of 
amplitude ratio curves (plotted 
against time), when compared with 
plots obtained in earlier investiga- 
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tions of the CR as a function of the 
CS-UCS interval, suggested that cor- 
relates of Hull’s stimulus trace and 
perseverative stimulus trace are pro- 
vided. 


EEG Changes During Serial Learning 


Obrist (146) tested several corol- 
laries of a hypothesis that rate of 
learning is a function of the excita- 
tory effect of the stimulus upon the 
organism. Fifteen college students 
learned a list of 16 equated nonsense 
syllables to a criterion of one perfect 
recitation. GSR and EEG were re- 
corded during the learning trials and 
during a 3-minute nonlearning con- 
trol period. Central excitatory effect 
was measured for each syllable and 
each trial in terms of (a) summed 
GSR in log-conductance units and 
(b) summed alpha voltage, on the 
basis of previous experimental evi- 
dence indicating that degree of 
arousal is directly Proportional to 
GSR and inversely Proportional to 
aa amplitude. 

agnitude of GSR was positiv 
related to (a) learning as ght 
with nonlearning, (b) individual rate 
of learning, (c) individual rate of 
learning of single syllables, and (d) 
group rate of learning. A typical 
bow-shaped serial position curve was 
obtained for magnitude of GSR when 
ae eS a plotted. 

pha voltage was signi 
less (7-14%) during Weide 
during nonlearning, and a Positive 
correlation of .60 was obtained be 
tween subjects’ alph 1 


a frequencies 
A 5 and 
their rates of learning, but there were 
no significant correlations between 


alpha voltage and (a) rate of le 
(b) number of correct antici 
and (c) magnitude of GSR, 
Obrist concluded that his GSR find- 
ings lend support to the hypothesis 
that learning is associated with a 
high degree of arousal or attention, 


arning, 
pations, 


and that attention is maximal for a 
given group of syllables during the 
time when the greatest amount of 
learning takes place. ‘‘Contrary to 
most present-day theories employing 
concepts of inhibition, the GSR re- 
sults suggest an explanation of serial 
position effects in terms of a factor 
of excitation...." He also feels 
that frequency changes in EEG may 
be meaningfully related to the learn- 
ing process. 


A Neural Mechanism for Memory 


Some epileptic patients experience 
minor seizures, or auras preceding 
major convulsive seizures, which con- 
sist of memory illusions (déjà VH, 
jamais vu) and/or memory hallucina- 
tions (dream-like auditory or visua 
experiences derived from the pa 
tient’s earlier life), These illusions 
and hallucinations differ from those 
reported by psychotics in that the pa- 
tient is quite aware of their unreality, 
however vivid they may be. Epilep- 
togenic foci in these cases are re- 
vealed by the EEG to lie in the tem- 
poral region, 

Penfield and his colleagues have, 
Over a period of more than 20 years, 
operated upon hundreds of patients 
with such psychic seizures and other 
forms of focal epilepsy (153, 154). 
During operation they have deter- 
mined the effects of direct electrical 
stimulation at thousands of cortica 
and subcortical points. The position 
of the stimulated points is recorde 
by photography, the electrical re- 
sponses of the brain by electrography 
with the recording electrodes on the 
cortex or in the brain substance, an 
the Motor, sensory, and psychic 
ere ei the stimulation are rê- 
acute. 4 sound recording of thé 

nd surgeon's comments: 
The operations are performed under 
local anesthesia, 


In a number of cases memory 


| 
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illusions and hallucinations have been 
elicited by stimulation of the tem- 
poral lobe or the adjacent posterior- 
parietal cortex. They have never 
been elicited by stimulation else- 
where, Generally auditory hallu- 
cinations are evoked by stimulation 
of the anterior part of the temporal 
lobe and visual from the posterior, 
but this is not invariable. Repeated 
stimulation of a given point will 
often elicit the effect over and over 
again, but sometimes hallucinations 
will change or a different memory 
will be evoked. Such instability of 
stimulation effects is not peculiar to 
these phenomena, having been ob- 
served elsewhere in the brain, even 
in the primary motor cortex. 
Penfield has proposed (151) that 
the recording and retention of in- 
formation are primarily functions of 
the “memory cortex" (the temporal 
cortex of both hemispheres, exclud- 
ing Heschl’s gyrus and some portions 
of the inferior surface) in conjunction 
with a central integrating system. 
The integrating role is attributed to 
what he and Jasper have called the 
centrencephalic system,” which is 
defined as “that neuronal system of 
the higher brain stem which has been 
in the past, or may be subsequently 
Pavan, to have equal functional rela- 
Snup to the two cerebral hemi- 
Spheres.” Under this definition are 
included the BSRF and the thalamic 
reticular system, the latter of which 
at least has connections with all of 
the sensory and language areas of the 
Cortex, Cortical sensory areas are 
C pasidered as way-stations between 
H periphery and the central in- 
egrating system. 
T according to Penfield’s hypothesis, 
pion and fusion of all types 
Pn a et are achieved in the cen- 
7 ncephalic system, and from there 
E ey are projected to the temporal 
Ortex of both hemispheres, where 


they lie dormant. Recording is as- 
sumed to occur by synaptic facilita- 
tion produced by the passage of 
impulses. Voluntary recall is effected 
by impulses from the centrencephalic 
system to both temporal cortices in 
pathways similar to those originally 
followed by the impulses that laid 
down the memory pattern. These 
impulses evoke the original pattern 
of cortical discharges, which runs off 
as a time series. Specific patterns 
may also be evoked by epileptic 
discharges in, or direct electrical 
stimulation of, the temporal regions 
of the brain. 

The following objections to this 
hypothesis were made by Lashley 
during the discussion of Penfield’s 
paper (151). (a) Bilateral removal of 
the temporal lobes in animals does 
not completely abolish memories, 
since they can be recovered spon- 
taneously, and therefore memory 
traces cannot be stored exclusively 
in the temporal lobes. (b) The small 
numbers of neurons in the thalamic 
nuclei and the centrencephalic sys- 
tem and the paucity of interneurons 
make it difficult to attribute the 
mediation of complex functions to 
those structures. 

To these arguments Penfield re- 
plied: (a) In higher mammals equipo- 
triking than in 
lower. $ 
terning, how can memo 
by tose stimulation? (o) Since the 
centrencephalic system is not exclu- 
sive, but functions with other areas, 
including the entire cortex, its pau- 
city of neurons is not a valid objec- 
tion. 

Ina later paper (152) Penfield sug- 
gests that the centrencephalic sys- 
tem may be divided into an A- 
mechanism and a B-mechanism. The 
function of the A-mechanism 38 to 
record conscious perceptions 1n per- 
sisting neuron patterns, and the B- 
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mechanism functions in recollection 
of past experiences and in the inte- 
gration of the sensory and motor 
systems. It has further been sug- 
gested (153) that the temporal- 
parietal cortex in conjunction with 
the pulvinar-lateralis-posterior com- 
plex of the thalamus plays a major 
role in the initial integration and 
elaboration of incoming sensory pat- 
terns. 

Penfield’s hypothesis clearly does 
not provide a complete account of 
the neural events underlying learning 
and memory. At present the hy- 
pothesis rests almost entirely upon 
one fact, that of the temporal locali- 
zation of memory engrams in epilep- 
tics. That the centrencephalic 
system is an integrating system, 
however logical it may seem, is at 
present only an assumption. But 
the hypothesis is intriguing and 
should give rise to additional re- 
search, 


Intellectual Processes 
Intelligence 


Studies of relationships between 
EEG variables and intelligence have 
been relatively few. Lindsley (121) 
reviewed the literature up to 1944 
and concluded that “it appears 
doubtful that there is any very high 
degree of relationship between in- 
telligence as measured by tests and 
the EEG.” Ostow drew a similar 
conclusion in 1950 (148). 

Only three references to the rela- 
tion between EEG and intelligence 
have appeared since Ostow’s review, 
Kreezer and Smith (112) correlated 
Stanford-Binet MA with various 
properties of the alpha rhythm in a 
group of mental defectives. The 
correlation between MA and alpha 
frequency with CA partialed out was 
.22 (not significant); correlations 
between alpha amplitude and MA 


and alpha index and MA were 
negligible. 

Rey, in a study cited by Hill (78), 
found that patients with paroxysmal 
EEG abnormalities differed from 
those with nonparoxysmal abnor- 
malities at the 1 per cent level of 
confidence with respect to having 
verbal scores lower than nonverbal 
scores, using the Mill Hill Vocabulary 
test and the Ravens Progressive 
Matrices. The tests did not dis- 
tinguish between persons with non- 
paroxysmal abnormalities and those 
with normal EEG's. 

Walter (203) reports a measure of 
the “versatility” of the brain which 
he believes is related to intelligence. 
“Versatility” is defined in terms of 
the variability of the EEG frequency 
spectrum (obtained by automatic 
frequency analysis) from one 10- 
second epoch to another. The less 
intelligent subject is reported to show 
a lack of versatility in this sense, that 
is, his frequency spectrum is much 
the same from one short epoch to the 
next. The very intelligent subject 
is reported -to display great vari- 
ability. 

The weight of evidence indicates 
that the alpha rhythm is unrelated to 
test intelligence. The findings of 
Rey and of Walter await confirma- 
tion. 


EEG During Mental Effort 


Evidence is found in the early EEG 
literature (121) that intellectual ac- 
tivity tends to be accompanied by 
alpha blocking and an increase in 
beta activity. Similar observations 
have been reported a number of 
times (66, 101, 120, 134, 172, 186). 

Knott (101) reported that the 
mean brain-wave frequency increases 
and the distribution becomes more 
skewed to the high end during periods 
of silent and oral reading as com- 
pared with periods of rest. No dif- 
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ference was observed between the 
effects of oral and silent reading. 
Hadley (66) reported a related in- 
_ crease in mean brain-wave frequency 
and heart rate during the solution of 
mental arithmetic problems. There 
Was no relationship between EEG or 
heart rate changes and the difficulty 
of problems; muscle potential ac- 
tivity, on the other hand, did vary 
with problem difficulty. 

Liberson (120) recorded the EEG 
during a word association test. He 
reported a relationship between alpha 
blocking time and reaction time, but 
neither complete data nor coefficients 
of Correlation were given. He found 
no differences in alpha blocking time 
for emotional as compared with 
nonemotional stimulus words. Mar- 
tinson (134) found no relationship 

etween alpha frequency, alpha in- 
dex, and mental blocking. 

In contrast with the findings of 
Others, Toman (180) found no 
E iarcu or consistent EEG changes 

uring the solution of mental arith- 

Metic problems in 64 medical stu- 
rae He reported that initial alpha 
locking did occur, but considered 

as a response to the stimulus only 
hii not related to the problem-solv- 
we Process. He noted that there 
in Pe: kei individual differences 
faa EG pattern changes. It is dif- 
eae to reconcile Toman’s report 
aod ene demonstrations by Knott 
the adley of significant changes in 
hat EG frequency spectrum, espe- 

“Y since Hadley’s subjects were 
Performing the same kind of opera- 
‘ons as Toman’s. 
bg ented and co-workers (92, 93, 
ind 5) have reported recording an 
wae adent brain rhythm associated 
tea aececi] processes. With 
pr a an electrodes on the temples, 
ae a 2 cps rhythm of around 204V 
2 Pitude occurring in spindle- 

aped bursts is recorded. These 
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have been named kappa waves. 
Kappa activity is markedly accen- 
tuated during reading, mental arith- 
metic, difficult discriminations, learn- 
ing and memory tasks, and problem 
solving. It is most pronounced dur- 
ing recall of previously learned ma- 
terial. 

Kappa waves could be recorded in 
only about half of the subjects tested. 
Changing of electrode placements 
and retesting did not result in record- 
ing kappa waves in subjects who 
had yielded negative results initially. 

Despite similarity of frequency 
and amplitude, kappa waves are 
apparently not a form of alpha 
activity, since they do not react to 
stimuli as the alpha rhythm does, and 
they wax during intellectual activity 
whereas the alpha rhythm wanes. 
Kennedy et al. have presented evi- 
dence indicating that kappa waves 
are of cerebral origin (arising in the 
temporal lobe, they believe). 

Bickford (personal communica- 
tion), however, believes that eye- 
movement artifact has not been 
satisfactorily excluded in the genesis 
of these waves. The observation by 
Teitelbaum (178) of spontaneous 
rhythmic ocular movements, pre- 
dominantly in the horizontal plane, 
during mental concentration is per- 
haps significant in this regard. Un- 
fortunately Teitelbaum did not meas- 
ure the frequency of the movements. 

Kennedy’s results have been nei- 
ther confirmed nor refuted by other 
workers. The last publication con- 
cerned with kappa waves from any 
laboratory appeared in 1949. 


Alpha Responsiveness and Mental 
Imagery 
During the last decade studies of 
the physiological concomitants of 
mental imagery have been conducted 
by workers at the Burden Neuro- 
logical Institute (62, 168, 169). The 


( 
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simplest exposition of this work is 
to be found in Walter’s book The 
Living Brain (203). 

Briefly, modes of imagery have 
been related to two physiological 
variables, alpha-type and respira- 
tion. The type of imagery which a 
subject is disposed to use is deter- 
mined by questioning him about his 
methods of undertaking certain as- 
signed mental tasks. Two modes of 
imaging are distinguished, the visual 
and the verbal-kinesthetic. Other 
modes, it is stated, are unusual (168). 
Generally subjects can be classified 
as (a) either visualists or verbalists 
who find it difficult to shift from one 
mode to the other, and (b) those who 
use predominantly one mode, but can 
shitt without great difficulty. The 
latter are in the majority. 

Three alpha-types are distin- 
guished: (a) the P (persistent) type, 
showing a strong alpha rhythm which 
persists in spite of stimulation or 
intellectual activity, (b) the R (re- 
sponsive) type, showing a good alpha 
rhythm during relaxation which re- 
sponds readily (blocks) upon stimu- 
lation and during intellectual activ- 
ity, and (c) the M (minus) type, 
showing no alpha rhythm or only a 
few low voltage alpha waves even 
during relaxation. Walter points out 
that most studies in the literature 
relating alpha activity to various 
psychological processes have involved 
only subjects showing strong alpha 
rhythms in order to facilitate fre- 
quency and amplitude measurements. 
This has resulted in the almost com- 
plete neglect of the M group. 

Two types of respiratory activity 
are distinguished, regular and irregu- 
lar. 

The findings of Walter and his 
colleagues have been essentially that 
subjects of the P alpha-type are 
predominantly verbalists; during in- 
tellectual activity their breathing is 


irregular, apparently due to activity 
of the vocal apparatus. Subjects of 
the M type are predominantly visual- 


ists, and during intellectual activity | 


their breathing is regular. Subjects 
of the R type are either visualists 
or verbalists, but can usually shift 
to the other mode without great 
difficulty. The M and P types are 
more successful in executing motor 
tasks when depending only on stereog- 
nosis for sensory cues (169); such 
skill is attributed to their consistent 
use of one type of imagery, which 
type being apparently of little im- 
portance. 

These findings recall, and indeed 
appear to support, Adrian's visual 
attention hypothesis (1; see above): 
They also raise a number of ques- 
tions (203). For example, what is the 
origin of the differences observed? 
(There is some evidence that alpha- 
rhythm characteristics are heredi- 
tary.) At what stage of development 
do differences in alpha responsiveness 
become apparent? (Strong alpha 
rhythm is commoner among children 
than among adults, M-type EEG’S 
less so.) To what extent do the ways 
of thinking imposed by these modes 
of imaging affect or determine per- 
sonality? 


Hypnosis 
EEG Changes During the Induction of 
Hypnosis 
As the subject passes from the 
normal into the hypnotic state nO 
changes can be detected in the EE 
pattern by inspection (10, 43, 50, 
72, 121, 129, 170). This evidence 
would indicate that the hypnotic 
state does not differ physiologically 
from normal wakefulness. Phys!0” 
logical sleep, however, can be 1" 
duced by hypnotic suggestion, or 
may occur spontaneously, under 2P” 
propriate conditions (9, 10, 72). d 
Darrow et al. (35) have reporté 
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that measurement of phase relation- 
ships between tracings derived from 
the motor and occipital areas of the 
brain under hypnosis (as compared 
with normal wakefulness) show a 
small, but statistically significant, 
increase of average in-phase corre- 
spondence during expiration. A 
comparison of frontal and motor trac- 
ings shows a small, but statistically 
Significant, increase in ‘parallelism 
or synchrony.” Similar changes occur 
during the onset of normal sleep in 
some subjects (36). They concluded 
that “hypnosis is not differentiated 
by this criterion from ‘hypnoidal’ 
States preceding and following sleep.” 
These observations have not been 
repeated by other workers. 
Gerebtzoff (59) induced cataplectic 
States in rabbits by “fascination,” 
ocular fixation, or turning. The 
opianenus electrical activity of the 
rtex was replaced by a slow-wave 
pattern indistinguishable from that 
pa normal sleep. Cortical responses 
weer stimuli were markedly 
Stent and sometimes only the 
a ischarge was seen. By this 
ence such states in animals do 


Not corres t i i 
ond to hypnotic states in 
man, p yp 


EEG Changes Under Hypnotic Sug- 
gestion 


s piee out of four studies have 
Sen that, when hypnotic sugges- 
is is made that a visual stimulus 
i nt, the alpha rhythm blocks 
is n though no objective stimulus 
aa sented (9, 121). Lundholm and 
x pe ogbach (129) failed to elicit such 
tive Ponse. They also obtained nega- 
The results with auditory stimuli. 
Rots were unable to explain the 
d a between their results 
Ose of earlier studies. 

ee other hand, suggestion of 
at ae does not prevent alpha 
ing when a light stim is 
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actually presented, even though sub- 
jectively the light is not seen (9, 50, 
121, 129). Loomis ef al. (128), how- 
ever, reported that suggestion of 
blindness to a hypnotized subject 
with eyes open in a lighted room 
was followed by the appearance of 
the previously absent alpha rhythm. 
They also reported that suggestion 
of anesthesia did not prevent alpha 
blocking following a pin prick, even 
though the prick was not felt. 

Barker and Burgwin (9, 10) re- 
ported successful induction of sleep 
patterns in the EEG by hypnotic 
suggestion, if the subject were prop- 
erly prepared. Subjects instructed 
to remember events during hypnosis 
did not remember events occurring 
while the EEG showed sleep patterns 
as well as events occurring while the 
EEG showed wakefulness patterns 
(10). 

By hypnotic suggestion to relax, 
Ford and Yeager (50) reported the 
induction of “good” alpha patterns 
in several subjects with anxiety 
states, whose previous EEG’s showed 
little or no alpha rhythm. Relaxation 
suggestions were not followed by 
EEG changes in subjects whose 
EEG’s naturally showed “good” 
alpha rhythm. These investigators 
also regressed two subjects, who 
had undergone craniotomies, to their 
preoperative periods. Their EEG's 
retained postoperative patterns. 


Emotion 


EEG Patterns Associated with Emo- 
tional Reactions 


In 1948 Lindsley (122) reviewed 
the literature pertaining to emotion 
and the EEG. He concluded that 
“under conditions involving some 
degree of emotional arousal, as 1m 
apprehension, unexpected sensory 
stimulation, and anxiety states, two 
principal kinds of changes are re- 
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flected in the EEG: (a) a reduction 
or suppression of alpha rhythm, and 
(b) an increase in the amount of 
beta-like fast activity.” These 
changes will by now be recognized as 
constituting part of the activation 
response of Magoun, Lindsley, et al. 
In recent years several investiga- 
tors have studied emotional states 
with the newer techniques of photic 
flicker stimulation and automatic 
frequency analysis. Ulett et al. (189) 
correlated the EEG’s of 191 sub- 
jects, recorded under standard and 
photic stimulation conditions, with 
an 8-point rating scale for anxiety 
proneness derived from psychiatric 
interviews and psychological testing. 
Significant correlations were found 
between anxiety-proneness ratings 
and (a) amount of subjective dys- 
phoria during photic stimulation, 
(b) amount of harmonic EEG re. 
sponse in the 20-30 cps Tange to 
flicker frequencies 4 and ł that rate, 
(c) displacement of the centroid of 
driving from the normal range, and 
(d) percentage of abnormal and 
low alpha records. A check list of 
EEG anxiety-indicators correlated 
with the criterion ratings .48. Under 
experimental stress (threat of electric 
shock) the occipital driving response 
acted the same way as does the 
alpha rhythm, that is, its “power” 
was reduced, However, there was no 
difference between subjects with anx- 
iety and those without, with respect 
to the magnitude of the effect. 
Ulett and Gleser (188) developed 
from their data 3 empirical scales 
designed to differentiate the anxiety- 
prone. The 3 scales were based on 
the basic EEG record, EEG re- 
sponse to photic stimulation, and 
subjective sensations induced by 
flicker. Each of the scales identified 
more anxiety-prone than nonanxiety- 
prone subjects from the original ex- 
perimental population. Used in com- 


bination, they identified 59.4 per 
cent of the anxiety-prone norma 
subjects and 65.5 per cent of the 
anxious patients, with only iL per 
cent “false positives’ (nonanxiety- 
prone subjects). These scales were 
cross validated on an independent 
sample of 110 (190). The scales 
based on basic EEG record and sub- 
jective sensations induced by flicker 
held up, that based on EEG response 
to flicker did not. d 
Knott and Correll (104) reporte 
a number of driving effects at funda- 
mentals and harmonics of omega 
frequencies in 14 stutterers. ; 
though these differed om cement! 
(1% level) from effects exhibited b 
controls, they were not striking = J 
did not appear to be of the type ¢¢ 
scribed by Ulett in the anxiety-pron® 
In contrast with Ulett’s finding 
suggesting lower “alpha power m 
the anxiety-prone than in the pee 
anxiety-prone, it was found 
normal speakers showed slightly he 
relative alpha-voltage than the stu 
terers. R 
Tracings derived from nasophary™ 
geal electrodes, presumably picking 
up activity of hypothalamic lak 
show changes similar to those . 
rived from cortical (scalp) electro i 
(122). It is, however, quite likely 
that nasopharyngeal tracings an 
ly represent the activity of t a 
mesial-inferior temporal cortex MO” 
than that of the brain stem. p: 
Slow waves have several ima 
been reported to be associated LA 
emotional states. Lindsley (12 : 
feels that those reported in the eat 
literature were probably artifactu 
or associated with skin electrica 
changes of autonomic origin. 4 
More recently Walter has p 
ported (202) that emotional dis- 
turbances arising during flicker stim 
ulation experiments are asuni A 
associated with brain waves in t? 


a 
y 
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temporal region of the brain at sub- 
harmonics of the stimulus frequency 
and usually in the theta (4-7 cps) 
frequency band (cf. the findings of 
Ulett et al.). If, Walter reports, the 
subject “abandoned himself to the 
emotional tide,” or if further emo- 
tional aggravation were added, the 
temporal component was accen- 
tuated. But the response was not 
entirely under the control of the sub- 
ne x the experimenter. Spontane- 
eE uctuations in mood influenced 
ie in the same direction as 
-term changes in attitude. 


An Activation Theory of Emotion 


ae concluded his 1948 paper 
“a brief speculations about 
iemsf cortical prais as mech- 
Shans. or the reflection of emotional 
ea Sh > the EEG. In his chapter 
(123) aron. in Stevens handbook 
oa ie: expanded these into an 
based cng theory of emotion, 
eaan = following: (a) the ob- 
en at the EEG in emotion 
the kno an on ae pattern, (b) 
a the wn p nysiological mechanisms 
the si ctivation response, and (c) 
pals ep ye that the the mech- 
Nien bee the basal diencephalon and 
which ee — reticular formation, 
and E ates motor outflows 
emotional me objective features of 
tical with expression is either iden- 
activatin 2 or overlaps the EEG 
ioe ns f mechanism Pree which 
Che erp he cortex. After outlining 
ing T | Saas evidence support- 
chema r asic points, he presents a 
emotional, account for varieties of 
erms of response and experience 1n 

of a hierarchy of neural mech- 


anis 

m ; ; ; 

isher. j involving progressively 
evels of the neuraxis. The 


Possi 

ia roles of the reticular struc- 

Sire and the limbic lobes (6, 150) 
emphasized. 


Mechanisms for ‘Functional’ Effects 
of Emotion on the Brain 


Darrow (32, 33) has emphasized 
the effects not only of neural, but 
also of hydrostatic, chemical, auto- 
nomic, and humoral mechanisms 
during emotional states. The balance 
of such influences, he feels, deter- 
mines cortical reactivity. In a num- 
ber of experiments using elaborate 
techniques for recording EEG and 
peripheral autonomic activity, he has 
demonstrated interaction of cortical 
and autonomic activity. He hypoth- 
esizes that in moderation the various 
mechanisms for homeostatic regula- 
tion of the brain “serve to regulate- 
and to terminate cortical excitation 
and to prevent self-perpetuating, 
circular, perseverating, and rumina- 
tive activities within the feltwork of 
the cortex,” but that either under- 
or overactivity of these mechanisms 
may “embarrass” cortical function. 
Various physiological conditions may 
increase susceptibility of the cortex 
to subcortical influences, and in such 
cases changes in autonomic activity 
incidental to emotional disturbance 
become crucial, sometimes resulting 
in what he has called “relative func- 
tional decortication.” Darrow sug- 
gests that the treatment of the latter 
conditions, which are identifiable by 
his testing techniques, is to improve 
cortical function, SO that the cortex 
may be less readily influenced by 
subcortical processes. In this he is 
opposed to the practice of using 
sedatives routinely in cases of exces- 
sive anxiety Or tension. It 1s cer- 
tainly true that in many such cases 
subhypnotic doses of sedatives 1n- 
crease, rather than decrease, the 
patient’s discomfort. This is es~ 
pecially true with many hyperactive 
children, who may become more | 
agitated under mild doses: of sedas, 
Eoad nE ee at 

reagent. sy- ESSE 
mid Hace WAKING COLLEGE 
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In a recent paper (34) Darrow 
reports data which appear to indicate 
that in brain-injured or emotionally 
unstable individuals stimulation 
tends to increase slow alpha-like or 
theta activity in the precentral 
parts of the brain, which he refers 
to as increased anterior dominance. 
He reports that this effect can be 
conditioned as can the more fre- 
quently reported alpha-blocking ef- 
fect. In more stable nervous systems, 
“activation” and Postcentral alpha 
“facilitation” prevail. Anterior domi- 
nance he interprets as re 
excitatory condition, 


dominance, regulation. He concludes 
that his evi 


repeated by other 
let et al. 


directly related to th 
sympathetic tone, 
onstrated that Peripher 
ic discharge does not b 
cortical activation 
rather by action on the pontomes- 
encephalic structures. Small doses 
of sedatives suppress these effects, 


Emotional Aggravation of Epileptic 
Processes 


It is well-known that some epi- 


leptics will have seizures more 4 
quently under emotionally distur S 
ing conditions than under enol 
ally stabilizing conditions. Severa 
workers have published data in- 
dicating an increase in EEG pathosis 
in epileptics during periods of emo- 
tional disturbance. 

Barker and his colleagues have re- 
ported a number of cases (7, 8, 1 
In some epileptics startle, and otha 
disturbing stimuli, as well as verba! 
stimuli aimed at the patient’s para 
cular emotional conflicts, will 2 
followed by bursts of abnormal wave d 
in the EEG, Both diffuse paroxysmā’ 
(spike-and-wave) and focal abnor 
malities can be so activated. orn 
epileptics and controls show oniy 
alpha blocking with or without mo 
cle potential. Berlin and Yeager 
(16), and Higgins et al. (77), bay 
reported similar observations. ; 

The mechanisms of such respon 
are not known, but Darrow’s hypot! 
eses would appear to be pertinent: 


Personality 


Proposed relationships al 
brain-wave Phenomena and norm 
Personality traits or variables W 
be discussed here. í 
among patients with “functiona a 
mental disorders are discussed E 
detail elsewhere (46). -noO 

In 1938 Gottlober (63) publish 
data which appeared to indicate i, 
Positive relationship between © 
traversion and high alpha indef 
the percentage of time that a ria 
able alpha rhythm is present in t : 

of a relaxed, awake subject), 
Henry and Knott (73) observed th 1 
Gottlober’s sample was loaded Me 
oth extraverts and high alpha * 


betwee? 
il 
EEG findings | 


í 


dices. With additional data they 
were unable to find any relationship 
between alpha index and extraver- 
-sion or introversion. 
Saul et al. (162, 163) proposed 
- several relationships between normal 
EEG patterns and certain person- 
ality characteristics: (a) Passive in- 
dividuals tend to have regular, 
persistent alpha rhythms of high 
index, and “masculine competitive” 
individuals tend to have low voltage, 
weak alpha rhythms of low index. 
They defined passivity as connoting 
i ‘dependence, submissiveness, the de- 
sire to receive from others, and readi- 
a hess to retreat from dangers, effort, 
Jand responsibility. The meaning of 
fi the term can be defined further by 
_ contrast with the antithetical con- 
i Stellation of independence, drive, 
dominance, activity, and masculin- 
ity.” (b) “(F)rustrated, demanding, 
| impatient, aggressive, hostile wom- 
a tend to have “mixed” type 
ee s (alpha index between 25 and 
E plus other waves both faster and 
slower than alpha) or ‘mixed fast” 
type EEG's (alpha rhythm over 10.5 
cps plus waves faster than alpha). 
The „first proposition has been 
giten cited in the literature (70, 117, 
18, 142, 148, 159, 160, 161), and 
appears to have been accepted as 
act Sisson and Ellingson (171) re- 
Viewed the evidence upon which that 
pp oposition was-based and found it 
“convincing. 
_ Rubin, Bowman, and Moses (142, 
0, 161), subsequent to the original 
Paper of Saul et al., published EEG- 
gp nety studies on several groups 
3 psychosomatic patients. These 
appeared to support a high-alpha- 
ex-passivity relationship. These 
Studies too were criticized by Sisson 
and Ellingson on methodological 
bpouride and on the basis that the 
Onclusions were based to some €x- 
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tent on circular reasoning. 

There have also been some at- 
tempts to relate Rorschach scoring 
categories to the alpha index. Travis 
and Bennett (182) compared two 
groups of normals with alpha in- 
dices over and under 50. They found 
that the high-index group gave 
significantly more whole responses 
(W), but was significantly lower 
than the low-index group with re- 
spect to R, Dd-plus-S%, sum C, and 
W, and took significantly more time. 

Brudo and Darrow (22) found 
significant rank-order correlations 
between M and alpha index of .532 
for 11 normal children, .636 for 10 
behavior problem children with pos- 
sible brain damage, and .619 for the 
two groups combined. è 

Sisson and Ellingson (171) com- 
pared two groups of 15 male neuro- 
psychiatric patients with alpha in- 
dices over 90 and under 10, respec- 
tively, on 20 major Rorschach scor- 
ing categories. The groups did not 
differ significantly with respect to 
age or distribution of neuropsychiat- 
ric diagnoses. None of the differ- 
ences between the two groups on 
Rorschach scores were significant 
(p's .30-.50). After discussing the 
limitations of using the Rorschach 
test in this manner to investigate 
relationships between personality and 
other variables, they concluded that 
“no study has been done conclusively 
showing a relationship between any 
feature of the normal adult EEG re- 
corded under standard „conditions 
and any personality trait or vari- 
able.... Since alpha and beta ac- 
tivity appear to be quite primitive 
functions of neural tissue, we find it 
difficult to believe that any of their 
measures will be found to correlate 
with any ‘of the dimensions of so 
complex and phylogenetically recent 
an entity as the human personality.” 
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CONCLUSION 


It must be clear even from this re- 
view, limited to papers dealing with 
relationships between brain waves 
and psychological processes, that ac- 
tivity in the field of neurophysiology 
since World War II has been pro- 
digious. Anatomists, physiologists, 
psychologists, neurologists, and psy- 
chiatrists have all contributed. Each 
new discovery seems to reveal the 
brain as an even more versatile organ 
than was previously appreciated. 
The major advances in the areas 
discussed have unquestionably stem- 
med from the delineation of anatomi- 
cal connections and Physiological 
functions of the reticular formation 
of the lower brain stem and the 
diffuse thalamic Projection syste 
The research Possibilities opened 
by these discoveries have been 
from exhausted. 


It will have been Tecognized that 
while the body of confirmed and es- 


m, 


up 
far 


ROBERT J. ELLINGSON 


unique, but caution cannot be over- 
emphasized. 

Theorizing too has propeedig 
apace. Not all of the theories ol- 
fered can be correct. It is obvious, 
for example, that too many functions 
have been attributed to the alpha 
rhythm; it has perhaps received aq 
undue amount of attention—a tend- 
ency which seems to be gradually 
diminishing. Some of the theories 
which have been outlined here E 
already obsolescent, but it can a 
least be said for most of them that 
they have inspired research. ; d 

Where the next few years will lea! 
it is impossible confidently to pre- 
dict. The mechanisms of the slee 
wakefulness cycle, of consc ouskmi 
itself, are within our grasp. fi 
approach is being made to the centra 
nervous mechanisms of sensato 
perception, and elementary moto 
activity. The so-called higher menta 


processes still appear to be beyon 
reach. 


Present neuropsychologic® 
tablished facts relating EEG phenom- techniques do not seem to be oat 
ena and psychological processes quate to deal with them. Lit d 
has been considerably increased since Practical assistance can be a 
Lindsley's review of 1944, the body to the Psychologist investigating “of 
of unconfirmed data has likewise in. Processes of learning S thinne 
creased. Unconfirmed observations, ‘© the psychiatrist ag in 
especially _ impressionistic as con- s oma E are eve” 
trasted with quantified ones, are Sear bee Soe bg F e tech- 
more likely to be negated than con- niques a Eh RE, com” 
rmed. This situation js hardly plex of all robles ji 
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VICARIOUS TRIAL AND ERROR AND RELATED BEHAVIOR 
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Bokr of experimenters have 
En, e ; a more or less typical and 
se gr en, of behavior which 
discri at the point of choice in the 
th imination box, maze, Or during 
a process in visual discrimi- 
mee studies employing jumping 
Suely , This pattern has been vari- 
on w escribed as “choice by nega- 
eki r comparison, “looking to the 
ig = before choice,” “sway- 
ol ack and forth,” “jumping to- 
enk and away from,” “head move- 
To a partial elimination,” etc. 
viens! general pattern of behavior 
naa zinger and Fletcher (32) have 
oa e name _ vicarious trial- 
a rror, abbreviated oi a bt 
rip i first used to label choice- 
lo ehavior of rats prior to spa- 
natin nonspatial discriminative re- 
ee the term VTE has subse- 
aT, been extended to vacillatory 
Tae Ch, in conditioning (17), reason- 
(38 as place vs. response learning, 
(24) k Pe 56), delayed reaction 
(43) sity: ct (2), and even nonchoice 
ihe ol uations. Moreover, not only 
fet a or choosing be- 
monie of rats but also of dogs (17), 
an i (21), children, (2, 15, 16), 
Suat ults (51) in various °; these 
ing.” E has been labeled “VTE- 
Tere : hus, VTE behavior now Te 
Tirou the vacillatory behavior K 
a e of Ssat points of choice 
eo a range of situations, . 

fee e theroretical level, VTE has 
ns Pes gee as a behavioral 

n of consciousness (45, 46, 


F > 
o anah with Human Resources Research 
, The George Washington University- 


47), a catalytic process which aids 
learning (48, 49, 51, 52), a form of 
symbolic exploration (29), an ana- 
logue or mechanism of reasoning (26, 
30), “overt thinking” (9), a þe- 
havioral index of conflict (2, 6, 19, 28, 
42, 57), or a preparatory response 
(43). Tolman (48, 49, 51), particu- 
larly, and Barker (2), Schlosberg and 
Solomon (41), Taylor and Reichlin 
(43), and Austin (1) have advanced 
relatively complex and systematic 
explanations of the VTE phenom- 
enon. 

On the empirical level, with the ex- 
ception of the brief treatments of 
VTE by Muenzinger (30), Tolman 
(48, 52), Dennis and Russell (11), 
Munn (35), Brogden (5), and Wood- 
worth and Schlosberg (62), there has 
been no adequate systematic an 
critical analysis of experimental pro- 
cedures and findings. Nor has the 
considerable work in this area Te 
sulted in an exegesis and comparison 
of the several theories of VTE be- 
havior. Theoretical interpretations 
of VTE behavior, however, will be 
treated in a later paper: The pur- 
pose of the present paper is to sum- 
marize and critically evaluate the em- 
aterial relating to 
The presentation and analysis will 
be divided into the following general 
topics: criteria for VTE, ome 
and response correlates of VTE, ant 
VTE and learning efficiency. 


CRITERIA FOR VTE 


Descriptive Behavioral Manifestations 


of VTE 


Muenzinger and Fletcher proposed 
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the term “vicarious trial-and-error” 
as a label for the behavior “of white 
rats at the point of choice in a very 
difficult discrimination,” where “the 
rats would stop and move their heads 
from alley to alley as if to gain a suc- 
cessive impression of the two stimuli 
to be discriminated” (32, p. 89). 
Later Muenzinger described this re- 
sponse alternation at the choice point 
as follows: 


The most common way is for the rat to stop 
at a midpoint between the alleys and turn his 
head first toward one and then toward the 
other alley. But he may also approach the en- 
trance to the alley and orient his whole body 
toward it and then turn and approach the 
other alley in a similar way. If electric shock 
is used as punishment he may stretch his body 
over the electric grid without touching it. 


A number of investigators subse- 
quently have employed Muenzinger’s 
term, VTE, in referring to or describ- 
ing selected aspects of choice-point 
behavior of the rat. For example, 
Tolman accepted the term VTE to 

esignate the “ ‘lookings or runnings 
back and forth’ which often appear 
at the choice point, and which all 
fat-runners have noted, but few 


Reichlin (43) h 
show that behavior similar t 
which Muenzinger and Tolman have 
designated as VTE occurs Prior to 
jumps over a 6- to 8-inch gap in a 
nonchoice situation, 

Up to the present time, no single 
or standard set of criteria (defi- 
nitions) of VTE behavior has been 
established and used consistently, 


o that 


This deserves special emphasis be- 
cause a number of investigators using 
different defining and/or scoring cri- 
teria have made cross references ap- 
parently without recognizing „that 
the comparability of these criteria 
has not been established. Further- 
more, it has often been difficult to 
ascertain from reading a particular 
article what defining and/or scoring 
criteria were employed by the investi- 
gator. 


Scoring VTE Behavior 


VTEing or VTE-like behavior has 
been quantified or scored by criteria 
which will be designated here as 
VTE units or VTE trials. The VTE 
unit is defined as behavior which in- 
volves (a) looking at or facing one 
side or card and then turning toward 
and looking at (or facing) the other 
side or card before making a choice, 
or (b) looking at or facing one side 
or card then turning toward and look- 
ing at the other side or card and then 
returning to the first card before 
making a choice. The former is desig- 
nated as the AB unit, the latter as 
the ABA unit, 

The VTE trial is defined as any 
trial during which one or more VTE 
units were recorded; thus, the VTE 
trial may contain one or more AB of 
ABA units. As an additional seman- 
tic clarification, in this paper the 
general or unqualified terms VTE: 
VTE behavior or VTEing will be 
used occasionally as equivalent to 
nonquantitative descriptive phrases 
such as “rapid vibrations of the 
head,” and “swaying back and 
forth,” or as general categorical 
terms for either or both VTE trials 
or units, 

It is difficult, if not impossible, to 
estimate the distortions or discrepan- 
cies in the interpretation of experi- 
mental data and in the comparisons 
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of different studies which may have 
been occasioned by the use of differ- 
ent criteria for scoring VTE. An ex- 
elion by the present writers of 
ata reported by Klüver (21) for 
weight, visual, and auditory discrimi- 
any of monkeys illustrates the 
oat more specifically. Although 
ine oo used VTE: trials as 
ane easure, in some cases his presen- 
in bee permitted rescoring the data 
that F of AB units. „It was found 
ie uring the acquisition of dis- 
uli eg responses to weight stim- 
deg AB unit and VTE trial ap- 
an of $ be highly related since of 
in he VTE trials reported ap- 
Bute ately 90 per cent contained 
eat AB unit. Although it was 
ton tl t to ascertain this relationship 
Ration visual and auditory discrimi- 
that ns, the impression was gained 
TE much smaller number of the 
aE for these problems were 
Gy ae the basis of only one AB 
the = his suggests that the size of 
ene ao between these differ- 
a fun E measures may be in part 
ie ction of the particular experi- 
ntal problem. 
tien bse cabs in this connec- 
iach some VTE data obtained by 
is ia, (59, 60, 61) disclosed that 
rsh “shock, shock-right, and shock- 
G in a black-white, non- 
the ion discrimination situation, 
on a ee of VTE trials based 
83 es one AB unit were respectively 
cite a 73 per cent, and 56 per 
ceri Che rank-order coefficients of 
ee between the number of 
ii trials and the number of AB 
ot FEE +.98 for no-shock, +.92 
right ock-wrong, and +.86 for shock- 
Catt Copter Thus, while the par- 
Parenti experimental conditions ap- 
trials As aimee the percentages of 
unit Ms ich contained only one AB 
sug » fortunately, Wischner’s data 
gest little, if any, influence on the 


correlation of VTE trial and AB unit 
measures. 

The lack of a standard criterion of 
VTE, together with the relative lack 
of more precise information with re- 
gard to intercorrelations among the 
various measures, represent obvious 
limitations in any attempt to eval- 
uate, both empirical findings and 
any interpretations of these findings. 
These limitations must be kept in 
mind in subsequent discussions of re- 
ported relationships and interpre- 
tations. 

As a first approach to a more pre- 
cise quantification and understand- 
ing of VTE behavior, the present 
writers suggest that the AB unit, 
that is, the smallest unit, be adopted 
as the basis for scoring, since it would 
yield all data concerning the phe- 
nomena from which one could con- 
struct other VTE measures up to 
and including the VTE trial? 


2 There are other considerations in scoring 
VTE behavior. One is the general problem of 
observing VTE's, for example, the occurrence 
of an AB unit. Thus, on some trials Wischner 
(60, 61) was unable to count and record each 
AB unit and so merely noted “much shifting.” 
Therefore, in the above analysis the number of 
AB units for those trials was arbitrarily scored 
as two units. This probably represents an un- 
derestimate. 

Such conditions as design of the entrance 
compartment, ease of delimitation of the 
choice point, spatial or cue characteristics of 
the stimuli to be discriminated, and extensive- 
behavior involved in making 


the choice may all influence the ease of iden- 
behavior. How to treat behaviors 
or facing the 
among AB units with- 


read, “ABA, return to entrance compartment, 
A, move to center, A, etc.” While these be- 
haviors may ultimately prove to be similar or 
equivalent to what Taylor and Reichlin (43) 
have termed preparatory responses, at present 
it would seem desirable to score these addi- 


tional behaviors separately. pa: 
In a situation in which VTE behavior in 
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ANTECEDENTS AND RESPONSE 
CORRELATES OF VTE 


Tolman is one of the few investi- 
gators who has systematically iso- 
lated and manipulated the variables 
which influence VTE. For the most 
part other relationships have emerged 
as incidental findings in experiments 
primarily concerned with other prob- 
lems. Within various contexts and 
depending on the purpose of the 
writer the data from essentially the 
same studies have been subjected to 
three kinds of analysis. First, Tol- 
man has been concerned with the 
nature of the relationships between 
VTE and errors in a variety of dis- 
crimination and maze learning sit- 
uations. A second aspect has been 
the analysis of the relative frequency 
of VTE as a function of special ex- 
perimental conditions without re- 
gard to the VTE-error relationship, 
Finally, relationships between VTE 


and other behavior measures h 


j é ave 
received attention. 


The Relationship Between VTE 
and Errors in Learning 


Tolman’s extensive treatment of 
VTE-error relationships is apparently 
related to his postulation that VTE- 
ing leads to greater learning effi- 
ciency (45, 46, 47, 48, 49, 51, 52). 
While these relationships have been 
analyzed for both discrimination and 
maze learning, support for Tolman's 
hypothesis originated in and subse- 
quently was drawn from discrimi- 
nation learning data, In the follow- 
ing discussion the findings for these 


two learning situations wil] be treated 
separately. 


children represented 


partial movements of a 
lever before a definite 


0 choice was made, Bark- 
er (2) weighted such movements for degree of 


displacement. It may be fruitful to develop 
techniques of scoring all types of VTE behay- 


ior in terms not only of frequency but also of 
amplitude, 


Discrimination Learning 


Yerkes, who described the VT Eing 
(although, of course, he did not use 
this label) of dancing mice at the 
choice point in his discrimination 
set-up, commented that “could a 
but discover what the psychica 
states and the physiological con- 
ditions of the animal were during 
this period of choosing, campia 
psychology and physiology wou 
advance by a bound” (63, pp. 130- 
131). There is some suggestion 10 
Yerkes’ writing that, with easily 
discriminable stimuli, more frequent 
VTEing as exemplified by choice n 
comparison appeared first and, while 
remaining at a high level in some ant- 
mals, in other animals decreased to 
choice by negation (one AB unit). 

In other early investigations Hoge 
and Stocking (18) and later Lashley 
(23) observed that the frequent heat 
movements of the early stages = 
discrimination learning in the Yerke 
box gradually decreased in number 
as the discrimination was acquired: 
Conversely, Gellerman’s (15) quali- 
tative protocol for discrimination 
learning in young children indicate 
that head movements had appeare™ 
as correct responses increased in fre 
quency. Pennington (36) reporte 
that “choosing behavior” appeare 
after 400 trials as rats began to loca z 
ize sounds and was most appare” 
during the last 100 trials of learning: 
He also indicated that Thuma (4 
had noticed similar behavior, 
the relationship of this behavior t 
errors was not explicitly specife®: 
Drew's observation that “during 
these early trials ‘vicarious trial a 
error’... appeared much ‘more a 
quently with this group (shock 4 
door of right choice) than we 
the others” (12, pp. 263-264) ba 
Not provide sufficient information > 
specify the course of VTEing. Simila 
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interpretive difficulties characterize 
Girden’s report “that the elimination 
of paw flexion to LS proceeded 
rapidly with its [head and visual 
ap goad to the left and right side 
a 7a appearance” (17, 
Pee between compari- 
er m avior and error elimination 
Klis been reported for monkeys in 
Com er s (21) string-pulling situation. 
eon EEEN behavior consisted of 
Se ye cl gi (successive pull- 
a Strings) and visual compari- 
e A movements from one box to 
for T, In general, Kliiver’s data 
ey cam auditory, _and visual 
le suggested the existence of an 
me a relationship between errors 
VTE requencies of comparison or 

trials. 
sec to data gathered in col- 
33 = ion with his students (31, 32, 
es A Muenzinger (30) concluded 
hhe requency of VTE trials was 
diin when errors were almost 

ated. 
es most complete data concern- 
and he relationship between VTE 
4 ao have been marshalled by 
See os The relationship which he 
‘© presented most, frequently both 
let oe oe and individual rats 1s a 
Gat ths course of VTE behavior 
oy rage during the learning proc- 
Cities ¢ us Tolman (48) has cited 
mia or black-white and tone dis- 
Muen ations obtained by Gentry 1n 
ae tl laboratory, which re- 
Te that errors had decreased as 
i trials had increased. Also, he 
reported that as errors decreased 
hte trials increased in discrimi- 
and m studies employing brain lesion 
ed normal animals, jump and no 
dine conditions, and near and far 

T minanda (48). 

n (49) compared AB units, 
ie a choices, and hesitation time 
ree experimental groups which 


had been presented with white-black, 
white-medium gray, and white-light 
gray discriminations with white posi- 
tive. | While in the case of the easiest 
discrimination maximum frequency 
of VTE coincided with the first error- 
less day, it is interesting to note that 
the VTE curve for the white-medium 
gray group was highest on days 8 and 
{1 although the fewest errors did not 
occur until the eighteenth day. Both 
error and VTE curves for the white- 
light gray group were relatively flat. 
When black-white discriminanda 
were separated by spatial angles of 
30°, 80°, and 130°, Tolman (50) also 
reported an inverse relationship be- 
tween AB units and errors during 
acquisition. In a study by Tolman 
and Minium (54) VTE frequency in 
AB units was highest on the seventh 
day although black-white discrimi- 
nation learning was not completed 
until the twenty-second day. When 
animals were then required to learn 
black-light gray and black-dark gray 
discriminations, the highest fre- 
quencies of VTE units did not occur 
simultaneously with diminished error 
frequencies. Tolman and Ritchie's 
(55) observation of an inverse rela- 
tionship between VTE and error 
curves was supplemented by another 
measure, namely the correlation CO- 
efficient, which they reported to be 
—.65 as between totals of errors and 
AB units for individual animals over 
all trials. : 
Analysis of VTE data compiled 
by Wischner (59, 60, 61) in a non- 
correction brightness discrimination 
indicated that no-shock, shock- 
wrong, and shock-right conditions 
had differential effects on both over- 
all frequencies of VTEing and rela- 
tionships between VTE's and learn- 
ing trials. The no-shock group 
VTEed on 12 per cent of the trials 
and had a mean of 0.15 AB units 
per trial. VTE's occurred on 22 per 
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cent of the trials with a mean of 0.45 
AB units per trial for the shock- 
wrong animals. The values for the 
shock-right group were 36 per cent 
and 0.56 units per trial. A plot of 
frequencies of either VTE trials or 
units against tenths of trials for the 
no-shock group revealed low fre- 
quencies for the first half of the learn- 
ing process and then a rapid and reg- 
ular increase for the last half. For 
the shock-wrong condition, both 
measures increased rapidly to a 
maximum at the second or third 
tenth, declined nearly as sharply and 
then leveled off for the last three 
tenths. While trials and units also 
increased rapidly to the third or 
fourth tenth for the shock-right con- 
dition, the curves remained at fairly 
high levels for the remaining trials. 

The relationship between VTEing 
and errors for these data may be 
expressed as the correlation between 
VTE trials or VTE units per tenth 
of trials on the one hand and errors 
Per tenth of trials on the other, that 
is, the relationship between corre- 
sponding points for Vincent error 
curves and Vincent VTE curves (in 
tenths of trials), The following rank- 
difference correlation  coefi cients 
were obtained for each group for the 
two VTE measures, trials and units, 


VTE Trials VTE Units 


No-shock 
Shock-right 
Shock-wrong 


These data Suggest a positive re- 
lationship between errors and both 
VTE measures for the shock-right 
group. The data for the shock-wrong 
group are somewhat ambiguous sug- 
gesting only a slightly negative rela- 
tionship for VTE trials and a rela- 
tively high positive relationship be- 
tween errors and VTE units. Only 
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for the no-shock group do the data 
clearly indicate the inverse ele 
ship between VTE’s and errors which 
has been stressed in Tolman’s writ- 
ings. It would appear from these data 
that the nature of the VTE-error 
relationship in discrimination aN 
tions is dependent upon eyperimen g 
conditions such as administration O 
electric shock. 

Rank-order coeff cients of correla- 
tion between various measures 0 
VTE behavior (trials, total AB units, 
and AB units per trial) and learning 
measures (errors and trials) for 1n- 
dividual animals of each group wen 
also computed for both the toa 
learning period and for the first 1 
trials. Of the 27 coefficients obtains 
none of the six negative values ca 
of —.01 to —.33) and only two of p: 
21 positive values (range of +.07 t 
+.71) were significant at the - a 
level. These data do not agree oe 
the coeff cient of —.65 between ri 
and VTE’s for individual an 
reported by Tolman and Ritch! 
(55). a. 

For correction animals learning F 
horizontal-vertical bar discrimination 
in a modified Grice apparatus, hen 
(22) reported the appearance g 
“much VTE behavior from the fi 
day of training on until just before 
the animal learned the discrimina- 
tion” which “coincided with a relay 
tively constant increase in the prs) 
tive reaction tendency” (22, p- 4 f 

he slower learning noncorrect!© 
animals exhibited less VTEing whic! 

owever, also appeared when corren 
responses began to depart from 
chance levels on about the twel 
day of training. Unfortunately, Da 
day-by-day record of VTE's wag 
Presented thus precluding a mo 
adequate assessment of the VT a 
errors relationships for the two cond 
tions, 


The Preceding data were for the 
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ee phase of discrimination 
eis During overlearning trials 
he nature of the VTE-error rela- 
tionship is apparently dependent on 
ae difficulty of the discrimination. 
aeoe be has been observed that for 
; sy discriminations, following at- 
ainment of the criterion, VTE fre- 
aey (AB units) decreases (54). 
ee en ha dif cult visual or audi- 
aa iscriminations are required 
high peat tends to remain ata 
Geisa uring overlearning trials 


Maze Learning 


Tereon observed that after a 
an ecrease in the distance of 
ist eg the blind alley, and 
compl ie entrance is eliminated 
Biss eg ys there frequently occurs 
of he iar and very rapid vibration 
Ger of ii s head between the direc- 
lematin, e true path and that of the 
alate g blind alley (37, p. 52). In 
haut, quantitative study of VTE 
S ootad n the maze, Dennis (10, 11) 
ard that more head movements 
quisitio in the second half of ac- 
8) obs n trials. : Similarly, Crannell 
ihe ape an increase in hesitation 
Chiteria /TEing as rats approached 
oth = in multiple-path mazes. 
Seen decreased following 
tt ent of the criterion. 
a ra forth running between 
a fone point and points beyond 
lean. ‘Screen apparently de- 
Halse ae successive trials for 
mals “A and “sometimes ran” ani- 
Teasoni one of Maier’s (26) maze-like 
umee situations. Jackson’s (20) 
A <i evated maze with jumps at 
ulate cae point was designed to sim- 
ashley | jumping requirement 1n the 
tion he Pa San During acquist- 
tion of a ay a positive correla- 
AB u 53 between VTE frequency in 
nits and errors in individual 
» a much larger correlation of 


the S 


+.78 between VTE’s and errors on 
maze units, and parallel curves for 
VTE, error, and time measures. 

VTE and error relationships have 
also been obtained in place vs. re- 
sponse learning mazes. Thus in three 
studies by Tolman and his collabora- 
tors (38, 40, 56) it has been noted 
that for both place and response 
learners, as errors decreased VTEing 
also decreased. Similar direct or 
positive VT E-error relationships were 
observed for 12 and 46 hour hunger 
drive place learners and for response 
learners under the same deprivation 
schedules (53). 


Evaluation 

For discrimination learning Tol- 
man has advanced the generalization 
of an inverse relationship between 
VTE and errors, that is, VTE fre- 
quency tends to increase to a maxi- 
imum at about the point where errors 
are nearly eliminated (48, 49, 51, Sa 
55). On the other hand, he concludes 
that in maze learning there is a posi- 
tive relationship between VTE and 
errors (55). 

An analysis of the studies sum- 
marized above, however, suggests 
that there are data which contradict 
or are not com pletely consistent with 
Tolman'’s generalizations. In this 
connection reinterpretation of the 
nature of the problem for animals in 
the place vs. response learning maze 
will be suggested later 1n this section. 

With respect to discrimination 
learning, it will be recalled that 
Yerkes (63), Hoge and Stocking (18), 
and Lashley (23) apparently did 
not observe more frequent VTEing 
as errors were eliminated. Wischner’s 
(61) findings for shock-wrong and 
shock-right conditions are also con- 
tradictory as were either the lack of 
correlation or the positive trends in 
correlations between VTE measures 
and rate of learning. Furthermore, 
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the data for Tolman’s (49) white- 
medium gray group and the Tolman 
and Minium (54) findings are not 
entirely consistent with Tolman’s 
generalization. 

In the maze, Maier’s (26) and 
Jackson’s (20) observations of a 
positive relationship between VTE 
and errors are not consistent with 
Peterson's (37), Dennis’ (10, 11) and 
Crannell’s (8) findings. These dis- 
crepancies may be due to the use of 
different types of mazes, different 
maze patterns, no jumps or jumps at 
the choice point, etc. Regardless of 
source, they raise doubts concerning 
the general applicability of the thesis 
of a parallel decrease of VTE’s and 
errors in mazes. 

Presumably because the positive 
VTE and error relationships for place 
and response learning problems were 
obtained in mazes, Tolman (53) has 
interpreted them as in agreement 
with his notion of different relation- 
ships in discrimination and maze 
set-ups. However, the use of mazes 
and the requirement of a place re- 
sponse notwithstanding, the condi- 
tions of the California studies (38, 
40, 53, 56) of place learning appear to 
approximate nonspatial discrimina- 
tion paradigms more closely than the 
spatial discriminations of most mazes, 
In the usual nonspatial situation the 
positions of the positive and negative 
stimuli relative to both room and 
animal are shifted and, regardless of 
location, the animal learns to ap- 
proach one (positive) stimulus and 
to avoid or not respond to the other 
(negative) stimulus. In place learn- 
ing the position of the animals rela- 
tive to the positive and negative 
places (stimuli) is changed by moving 
the animal rather than the stimuli. 
As a consequence, on some trials the 
positive place or stimulus is on the 
left and on other trials on the right 
and the animal must learn to ap- 
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proach one place or cue and avoid the 
other. If place learning is a variant 
of discrimination learning, however, 
the above-noted observations of dec- 
rements in VTE units as errors de- 
creased contradict the findings for 
more conventional nonspatial dis- 
crimination conditions. i 
Evidence in support of a nonspatial 
discrimination learning interpreta- 
ation of place learning has been Te 
ported by Blodgett and McCutchan 
(3) who found that animals, traine 
to turn right and to approach a light 
disc simultaneously, would turn left 
following reversal of the disc cue: 
Ritchie, Aeschliman, and Pierce (39) 
also suggested that both turning 
responses, presumably based on kin- 
esthetic cues, and place or approach- 
a-discriminable-positive stimulus Te 
sponses were acquired under pa 
learning conditions. Webb (58 
hypothesized that, if place and pa 
sponse learning were conceived 2$ 
discrimination behavior, the greate? 
the difference in extramaze cues ie 
stronger the approach response tO th 
extramaze cue accompanying rewat' i 
In accordance with this interpret? 
tion, when discriminable extrama?® 
cues were placed in opposition 
turning responses, the greater * < 
cue differences the higher the PC 


ive 
centage of approaches to the positiv 
stimulus. 


VTE as a Function of Special 
Experimental Conditions 


Similarity and other characteristic? 
of stimulus conditions influence Y, 5, 
ing in various learning situation” 
Also relevant are motivation and pe 
flict variables, position preference" 
organic conditions, and cage-rear ing: 


. . . j. j S 
Similarity and Other Characterist!© 
of Stimulus Conditions í 

o 


The influence of similarity eg 
stimuli-to-be-discriminated on V 


~ 
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frequency has been investigated most 
extensively. Variations in VTE fre- 
quency as a function of spatial angle 
between discriminanda, stimulus 
modality, form of visual stimuli, dis- 
nee of stimuli-to-be-discriminated, 
istance of the jump, length of delay 
3 reaction or of choice, length of cul- 
Aoma, and maze rotation have also 
een noted. 
cape similarity. With respect 
o the role of stimulus similarity 
erkes observed that: 


seams conditions are difficult to dis- 
frequent! choice by comparison occurs most 
P ena poa persistently. If, however, the 
inabile s a cee be absolutely indiscrim- 
testes å . the period of hesitation rapidly in- 
PE ei uring the first three or four series of 
ens p the mouse seems to lessen its ef- 
onka iscriminate and more and more tends 
tion ae anto one of the boxes without hesita- 
examination . . . (63, pP- 131-132). 


che ea oe relationship between 
Gain y and VTE is, therefore, 
enean with relatively lower fre- 
Simnas of VTE for both easy dis- 
a ions and, “absolutely indis- 

e conditions. 
rin will be recalled, however, that 
white easy discrimination” black- 
stüdy ae of a previously cited 
ora 9) learned the discrimination 
oe and VTEed more than 
and sate ich had white-medium gray 
creas gray discriminanda. 
usin E y, Tolman concluded that, 
ations oe acquisition of discrimina- 
Varied by animals, VTE frequency 
of vig inversely with the similarity 
tory so (and by implication audi- 
$ ‘i actual, etc.) stimuli, that is, 
ee similar the stimuli-to-be- 
Ein inated the more frequent the 
not E However, these data do 
a this conclusion un- 
Keimen y. Thus, because the ex- 
white-m sa terminated when the 
iig oal edium gray group was mak- 
nly 75 per cent correct choices 


and the white-light gray group was at 
chance levels, it is possible that these 
groups were not given enough trials 
to permit the occurrence of the fre- 
quent VTE's which, according to 
Tolman, just precede or accompany 
error elimination. 

Tolman (51) has also referred to 
unpublished observations that in 
humans similarity has an opposite 
effect, that is, the greater the simi- 
larity of stimuli the more frequent 
the VTEing. This apparent species 
difference was attributed to the pos- 
sibility that rats first had to ‘‘discover 
instructions” or learn what-to-do, 
whereas through instructions this 
prerequisite had already been met 
by humans. It would follow, there- 
fore, that after rats had learned what- 
to-do the inverse relationship should 
change to a direct relationship with 
VTE increasing as similarity in- 
creases. 

During test trials, administered 
after rats had learned a black-white 
discrimination, Schlosberg and Sol- 
omon (41) found a direct relationship 
between VTE (ABA units) and the 
degree of similarity of the stimuli-to- 
be-discriminated. Similarly, Brown 
(6) tested the strength of reactions, 
including the number of head move- 
ments (AB units), as a function of 
increasingly difficult discriminations 
after rats had learned to discriminate 
bright from dim. The group which 
had been shocked for errors during 
training and was one hour hungry 
during the test exhibited an increase 
and subsequent decrease in the num- 
ber of head movements as positive 
and negative stimuli were made more 
similar. This trend was predicted, 
however, as the outcome of conflict 
between a weak approach response 
and a strong avoidance tendency: 
For the other groups increased dif 
ficulty of discrimination was accom 
panied by more frequent VTEing 
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When retrained and then retested 
on the same discrimination, all 
groups exhibited increased head 
movements as the discrimination 
became more dift cult. Because of 
different training procedures and 
because testing was done after the 
development of discriminations, these 
results are not directly comparable 
to the Tolman data noted above. 
Schlosberg and Solomon's and 
Brown's findings are consistent with 
Tolman and Minium’s (54) observa- 
tions for overlearning trials that, for 
difi cult discriminations, VTE fre- 
quency did not decrease with over- 
learning but for easy discriminations 
there was a decline in VTE behavior. 

Kliiver (21) does not Present suffi- 
cient data to evaluate the nature of 
the relationship between VTE be- 
havior and the similarity of visual 
or auditory stimuli. For Weights, 
however, he concluded that “in 
experiments requiring successive 
comparisons we made the general 
observation that a decrease in the 
stimulus difference does one of two 
things: it either abolishes 


n € the com- 
parison behavior entirely or almost 


entirely, or it increases the number of 
comparisons” (21, p. 45) 
Spatial angle between stimuli 
discriminated, 
VTE frequency 


-to-be- 
Tolman (50) found 


\ in units to be an 
Inverse function of the 30°, 80°, and 


130° angular Separation of black- 
white discriminanda. The 30° group 
exhibited more VTE's and faster 
learning than either of the other 
groups, and longer choice time than 
the 80° group. More VTE’s and 
fewer errors were recorded for the 
80° group than for the 130° group, 
while the latter group required longer 
choice times than either the 30° or 
80° groups. The longer choice times 
for the 130° group were explained asa 
tendency for those animals to remain 
“stuck” at a particular stimulus, and 
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to “forget” the alternative stimulus. — 


More frequent and persistent VTE- 
ing after learning a black-white dis- 
crimination in a T-shaped than ing 
Y-shaped box has also been reporte 
30). 
: ae modality. Klüver found 
that the relative frequency of a 
comparisons was greater with Mee 
than with auditory or weight stimull. 
The Gentry data reported by Tolman 
(48) indicated that VTE frequency 
is higher for auditory than for visua 
discriminanda. Klüver'’s data to 
total frequencies of both visual an 
motor comparisons with auditory 
and visual stimuli appear to be con- 
sistent with this conclusion. 1) 
Type of visual form. Kliiver (2 f 
who has reported the only relevan 
data on the influence of types S 
visual form, found greater hesitate 
and more frequent comparisons wee 
triangles and crosses than wi e 
squares, circles, hexagons, and irreg" 
lar forms, ess. 
Distance of stimuli-to-besdiserimt 
nated. Tolman (48) reported an © 
periment carried out in his laboratory 
in which conditions were so arrani 
that the black-white discriminan 
were 23} in. away from the jumping 
platform for one group and 123 mi 
away for another group. It was E 
served that VTE frequency in tria 2 
was considerably higher for the latte 
group. However, the 234 in. groub 
reached a level of only 75 per oe 
correct responses at the terminatio 
of training. It js possible, therefore 
that termination of training Bie. 
vented the appearance of a higher 
frequency of VTE as the animals i! 
this group approached criterion. d 
Distance to jump. Schlosberg a” 4 
Solomon have advanced the hypot A 
esis that “anything that increas? 
the general tendency not-to-jumf 
(as, eg., increased distance) wW! 
increase latencies, increase VIE 


oe EEEE E M 
= I a 
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and decrease errors” (41, p. 38). 
Tolman (48) found that in a black- 
white discrimination a no-jump group 
made considerably fewer VTE trials 
than a group which was required to 
jump an 8} in. gap between the 
jumping and landing platforms. Sim- 
ilarly, animals which had a longer 
jump from the small platform in Mc- 
Cord’s (24) delayed reaction set-up 
looked at” more doors during the 
last 60 trials than a larger platform- 
oe jump group. It may be noted 
k at because of differences in plat- 
orm size the stimuli for the smaller 
platform group were also farther 
away. 
on of delay of reaction or of 
oe McCord (24) reported that 
a i greater the length of delay in a 
Se ayed reaction situation the more 
pee faced or examined. “A tend- 
Ney to pause occasionally and re- 
ae one door or another” under 
ay response conditions has also 
een noted by MacCorquodale (25). 
i uenzinger and Fletcher ascribed 
s ae ig in learning efficiency 
ie ting from enforced delay at the 
ti Ice point to “the longer and at 
i mes successive facings of the stimuli 
o be discriminated” (33, p- 389). 
fo Length of cul-de-sac. Dennis (10) 
heey that frequencies of head move- 
ents and of partial eliminations 
Were directly related to length of cul- 
€-sac, 
ooi tahon of the maze. Fóllowing 
le mpletion of place and response 
one trials (53) VTE frequency 
Creased slightly as the result of a 
0° rotation of a T maze. 


Motivation and Conflict Conditions 


oe generalized to other situa- 
t i chlosberg and Solomon’s (41) 
tion $ that VTE increases as a func- 
tend of conditions which strengthen 
Str encies not-to-jump suggests that 

ength of punishment might be 


related to VTE behavior. Punish- 
ment presumably influences fear or 
anxiety as well as the strengths of 
the anxiety-motivated avoidance re- 
sponses of various conflict situations. 
Therefore, VTE frequency might 
also change with variations in drive 
strength and types of conflict. 
Punishment (electric shock). Muen- 
zinger (30, 34) reported more VTE 
trials for shock-right and shock- 
wrong groups than for a no-shock 
group. After attainment of criterion 
VTE frequency of the more fre- 
quently and continuously punished 
shock-right group remained at a 
high level in contrast to the decline 
in VTEing of the shock-wrong group. 
Wischner (59, 60, 61) found that a 
shock-right group exhibited more 
VTE trials and AB units than both 
no-shock and shock-wrong groups, 
although the latter VTEed more 
than the no-shock group. : 
Brown (6) demonstrated that shifts 
in conflict situations from approach- 
approach, to double approach-avoid- 
ance (approach-avoidance tenden- 
cies to both stimuli of the conflict 
situation), to avoidance-avoidance, 
which involved increasing strength 
of punishment, led to increased fre- 
quencies of head movements. More 
yvacillations occurred in the more pun- 
ishing avoidance-avoidance situa- | 
tions of the Barker (2), Klebanoff 
(28), and Hunt (28) experiments 
than in approach-approach conflicts. 
Fairlie (13) classified choice-point 
behavior ina black-white discri mina 
tion box as no-pause, look 1 (pause 
but head turned in one direction 
only), look 2 (looked one way and 
went the other), and look 3 (looked 
back and forth three or more times). 
His shock-wrong group made con- 
siderably more no-pause and less 
look 1 responses than the shock- 
right group; frequencies of look 2 
and look 3 responses for the two 
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groups appeared to be more nearly 
equal. 

Muenzinger and Fletcher (32) 
found that a group with hunger and 
escape from shock as motives made 
more errors and exhibited less VTE 
than a second group motivated by 
escape from shock alone. No food 
was given for the correct responses 
of either group. Because the former 
group made more errors, it was 
punished more frequently and hence 
should have made more VTE’s. 
However, the presence of the hunger 
drive, which would presumably de- 
crease VTE’s (see below) and at the 
same time prolong exploration for 
the nonexistent food, might have 
accounted for the failure to obtain 
a higher frequency of VTEing for the 
dual motive group. 

It would seem that VTEing is more 
likely to occur as responses are more 
frequently accompanied by 
ing consequences, H 
and other factors, however, may 
sometimes obscure this relationship, 
Drew’s (12) report that rats shocked 

jat the door for the right choice VTE 


r punish- 
igh hunger drive 


more frequently than no-shock, 
shock-at-wrong-door, or shock-at- 
food groups Suggests that the locus 


of administration of shock 
one of these additional fact 

Strength of (appetitional, 
drive. Both Brown (6) and 
and Gleitman (53) h 
data which are in acc 
the derivation that “ 
increases the general 
ward-going tendencies (e.g., increased 
hunger) will decrease latencies, de- 
crease VTE, and decrease the number 
of errors” (41, p. 38). In a series of 
postlearning tests the former investi- 
gator found that animals run under 
conditions of 48 hours hunger drive 
made fewer head movements than 
animals one hour hungry. The latter 
found a lower frequency of VTE 


might be 
ors. 

aversive) 
Tolman 
ave reported 
ordance with 
anything that 
level of for- 


and 


errors for place and response learning 
animals run under 46-hours food 
deprivation than for place and re- 
sponse learning animals 12 hours 
hungry. 

As noted above, Muenzinger and 
Fletcher (32) reported that a group 
motivated by hunger and escape from 
shock made fewer VTE’s than an 
escape-from-shock group. The pres- 
ence of the hunger drive, with no 
food for any response, may have had 
the dual effect of inhibiting the ex- 
pected higher frequency of VTE, 
which more frequent punishment be- 
cause of errors would presumably 
have occasioned, while at the same 
time serving to increase errors by 
prolonging exploratory behavior. As 
a consequence, this study does not 
appear to be a critical test of Be 
generalization that VTE’s an 
strength of the hunger drive are a 
versely related. Sears and Hevlan 
(42), who varied the strength 0 
motivation for conflicting ayoidanor 
responses, found low percentages ° 
double reactions (responses to oni 
signal and then to the other) in 4 
three of their groups of human 5% 
A high Proportion of blocking te 
Sponses due to equality of response 
Strengths was thought to account for 
the low percentage of vacillation fo! 
two groups; the infrequent vacilla- 
tions of the third group were attrib- 
uted in part to a disparity in the 
Strength of the competing reactions: 
Intergroup differences in double Te 
actions were not significant. 

Type of conflict. In a study bY 
Barker (2), children indicated pref- 
erence for one of two liquids bY, 
moving a lever in the direction © 
Choice. These lever movements 
appearing as lines on recording P3 
Per, were weighted for extent of dis 
placement and summed to indicate 
amount of VTE behavior. It wa 
observed that the more equal the 


Uff fase’ 


preference for any two liquids, the 
greater the amount of VTE behavior. 
Moreover, VTE scores tended to be 
highest when the choice was between 
pairs of undesirable liquids. The 
latter finding is consistent with 
Brown’s (6) observation of more head 
movements by rats in an avoidance- 
avoidance conflict than in double 
approach-avoidance or approach-ap- 
proach conflicts. The previously 
cited findings of Klebanoff and H unt 
are also in accord with the Barker 
and Brown results. Miller's (28) re- 
a that Godbeer’s children Ss made 
ore eye movements in double-ap- 
proneb- Avoidance than in approach- 
Patio situations adds further 
Upport to the conclusion that VTE 
ehavior decreases as the type of 
— shifts from avoidance-avoid- 
ee ts approach-approach. Hov- 
“aed oa Sears’ (19) observation, 
nee or human Ss, that more double 
A i occurred in an approach- 
CS a abc conflict than under ap- 
tee h-approach or avoidance-avoid- 
nl conditions represents a slight 
ception to this generalization. 


Position Preferences 


nl frequency and position pref- 
demi during acquisition of dis- 
ination responses. Kliiver (21, 
ER 79-80), observed that monkey 
_ exhibited no comparison be- 
avior in a weight discrimination 
R the trials on which it was re- 
Ponding in terms of a right position 
Preference, Inspection of curves 
ented by Tolman (49) indicates 
tiba ia the early stages of discrimina- 
opa earning, if position habits were 
res there was very little 
bee Eing. As the animals began to 
tore or with fewer than chance er- 
alysis r frequency increased. Anal- 
data ts some of Wischner’s (60, 61) 
or a no-shock group also sug- 


Se 


gests that VTE's were less frequent 
during the position habit days early 
in learning than on later days when 
position responses were decreasing 
in strength. Lane's (22) data can be 
interpreted in similar fashion. 

VTE frequency when negative stim- 
uli are on the position habit side. 
Wischner (60) has reported that for 
a shock-wrong and no-shock group 
respectively, 74 per cent and 84 per 
cent of the VTE trials occurred when 
the animals, as they advanced from 
the entrance compartment, happened 
to approach first the negative stimu- 
lus. Forashock-right group, however, 
this value was 42 per cent. These 
figures are for the total learning 
period. Corresponding percentages 
for the first 100 trials for the first 
two groups were 72 per cent and 70 
per cent. For the shock-right group 
this value was only 29 per cent. It 
has been suggested (59, 60) however, 
that for the latter group, the no- 
shock alley might be considered 
“positive,” particularly in the early 
stages of learning. These data sug- 
gest that the greater percentage of 
VTE trials occurred when the animal 
chanced to approach first the alley 
in which it received shock, or in the 
case of the no-shock animals, the 
alley in which it was simply frus- 
trated (no food reward). Further 
analysis of no-shock group data 
suggested that these animals tended 
to face the negative stimulus first and 
VTE when it was presented on the 
position habit side. Therefore under 
the specific condition of the presenta- 
tion of the negative stimulus on the 


position habit side VTE trials ap- 


peared to be directly related to posi- 
tion preferences; this relationship was 


most apparent as completed position 
frequency. 


responses decreased in 

Since a similar relationship was not 
observed for shock-wrong and shock- 
right groups, 2 more complicated 
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but as yet unspecified interaction of 
relevant variables is indicated. 


Organic Conditions and Cage-rearing 


Tolman (48) has reported Fried- 
man’s findings that normal animals 
made fewer errors and more VTE 
trials in learning to turn left in a sim- 
ple T than animals with a moderate 
amount of cortical lesion. In another 
California study (48), after rats 
were blinded upon completion of 
place or response learning under 12 
or 46 hour hunger drives, frequencies 
of VTE units were higher. 

Cage-rearing as one type of pre- 
experimental experience apparently 
led to greater hesitation, freezing, 
and VTEing at the choice-points of a 
9-unit T maze (7). 


Response Correlates 


Length of entrance into the cul-de- 
sac or wrong alley. Muenzinger (30) 
reported that for no-shock and shock- 
wrong groups VTEing was seldom 
followed by full penetration of the 
Incorrect alley. However, the rats 
of the shock-right group when mak- 
ing an incorrect response continued 
to proceed to the end of the wrong 
alley during the first 100 trials. 
Peterson (37) noted that the highest 
frequency of head movements oc- 
curred when complete entrances, 
half-entrances, and starts into cul- 
de-sacs were disappearing and/or 
had disappeared, Dennis (10) has 
reported similar results, Thus, in 
general, VTE frequency would ap- 
pear to increase as length of entrance 
into the cul-de-sac or wrong alley de- 
creases. With further learning, how- 
ever, VTEing should decrease and 
disappear. 

Correct responses. Referring to 
Dennis’ earlier study, Dennis and 
Russell noted that “runs in which 
head movements occurred were cor- 
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rect about twice as often as the aver- 
age run or the immediately preceding 
run” (11, p. 307). Fairlie (13) ob- 
served that look 2 and look 3 re- 
sponses made by the shock-wrong 
group were apparently related to a 
greater frequency of correct responses 
than the no-pause and look 1 re- 
sponses; this pattern did not hold for 
the shock-right group. In Wischner s 
(60, 61) study, the least eficient 
shock-right group showed the most 
VTE’s for both total learning and 
for the first 100 trials. Moreover, this 
group made correct responses ‘on 
only 41 per cent of the trials on which 
VTE's occurred as compared with 
correct responses on 85 per cent and 
89 per cent of the trials on which 
VTEing was observed for the shock- 
wrong and no-shock groups, respec- 
tively. For the first 100 trials the 
Percentages for the latter two groups 
were 84 per cent and 73 per cent wit 
the shock-right group now respond- 
ing correctly on only 28 per cent © 
trials on which VTE behavior 0c- 
curred. Wischner suggests, however 
that if the “correct” response for 
the shock-right group is considered 4 
response of avoiding the shock alleys 
VTE would then be related to ‘co! 
rect” choices. ; 
Hesitation (choice) time. As might 
be anticipated, a number of investi- 
gators found that hesitation or choice 
time and VTE frequency were pos! 
tively correlated in discrimination, 
trial-and-error, and delayed respons? 
situations (49, 20, 24). These findings 
are supported by the results of a fac 
tor analytic study of rats’ behavior 
reported by Geier, Tolman, an 
Levin (14). Further, Barker (2) an 
Godbeer (28) employing children aS 
Ss found a positive relationship Þe- 
tween VTE’s and the time necessary 
to resolve conflict situations. VTE- 
ing and hesitation time were als? 


sem 
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directly related in Crannell’s (8) 
path-elimination maze. 
aons exception to the generality of 
To reene results, the fact that 
eed animals in Tolman’s (50) 
Te a reaction times but 
by. li /TE's, might be explained 
Bi the ee that the animals 
eee group _tended to get 
ai RR at one stimulus and to 
teh, the other. Other explana- 
Sic. no doubt possible. A second 
ae the breakdown in the 
Bacon: relationship between head 
as og and time to run to the 
cone p for the first test series for 
are S (9) Group A (shocked for 
Bear a ing training; tested with one 
Eet nger drive) wasconsistent with 
while Corea, formulation. Thus, 
Fisch n positive relationship be- 
olds TE’s and hesitation time 
Eo Eeay, specific experimental 
A ae conditions may Te- 
ship, qualification of this relation- 
Satie response correlates. With the 
teat ep of Group A in the first 
er aoe! Brown (6) found that af- 
NN nia was completed, head 
cup foe time to run to the food 
Alen ‘ie or choice time), 
a i of pull in attempting to 
haa of the field, and distance run 
acting into the alley from the 
correlated. int ese ee 
eee and Reichlin (43) argue 
Be ee np Mantle ag 
Beha ehavior in a discrimination 
ie Postulating that VTE 
fae are preparatory r espomses 
respo 1ave lower thresholds than full 
ies they used some of Hull's 
E to deduce several theorems 
shins ies changes in and relation- 
ohne etween various characteristics 
Sy eae of full and prepara- 
esponses in a situation not 


requiring choice between alternatives. 
Velocity of full responses was defined 
as the reciprocal of reaction latencies. 
Velocity of preparatory responses was 
equal to the number of preparatory 
movements before the final response 
plus one divided by the total time 
the animal faced the stimulus. In 
general, the observed relationships 
between the forms, maximum stand- 
ard deviations, and coefficients of 
variability of the distributions of full 
and preparatory responses agreed 
with theoretical expectations. 


VTE AND LEARNING EFFICIENCY 


Muenzinger (30) postulated that 
VTE behavior functioned as primi- 
tive thinking or rudimentary trial 
and error to facilitate acquisition of 
correct choices. Tolman’s similar 
hypothesis, “that VTE's always aid 
the learning which they accompany” 
(48, p. 32) was later restricted to 
discrimination learning (55). 

One set of data relevant to this 
hypothesis, relationships between 
VTEing and errors in discrimination 
and maze situations, has already 
been evaluated. Because of the per- 
tinence of other findings, however, 
particularly when viewed in the per- 
spective of the importance of the 
VTE and learning efficiency hypoth- 
esis in various versions of Tolman's 
system, final examination was de- 
ferred for additional development in 
this section. Criteria for the hypoth- 
esis will be noted before considera- 
tion of its adequacy. 

Criteria for the learning efficiency 
hypothesis. The relationship to which 
Tolman has referred most frequently 
as supporting the VTE and learning 
efficiency hypothesis involves an in- 
crease in VTE measures to the learn- 
ing criterion accompanied by @ de- 
crease in errors (48, 49). This rela- 
tionsbip has been supplemented by 
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observations, often in the same exper- 
iments, that the groups which VTEed 
more frequently learned more rap- 
idly. A negative correlation (55) be- 
tween totals of errors and AB units 
for individual animals has also been 
employed as a criterion. (Because 
such totals are dependent on number 
of trials to criterion, AB units per 
trial is probably a more appropriate 
VTE measure.) The high percentage 
of correct responses on trials on 
which VTE’s occurred has been a 
fourth criterion. 

Some of Muenzinger's (30) and 
Tolman’s (49) data Suggest fairly 
close relationships among these cri- 
teria. However, Wischner’s (59, 60, 
61) study, in which all four criteria 
were determined, does not support 
high intercriteria correlations. 

Evaluation of the learnin 
hypothesis. Inconsistencies in the 
empirical data an 
different 


e general 
esis that, 


t situations, 
VTEing per se aids learning, Thus 


it will be recalled that Yerkes’ (63) 
Hoge and Stocking’s (18), and Lash. 
ley’s (23) early observations, the 

E curves for Wischner's (61) 
shock-wrong and shock-right groups 
some of Tolman’s data (49, 54), and, 
when interpreted as $ 
learning, the VTE and 
ship of the place learn 


discrimination 


panied more 
discrimination 
as not the case 
ation, Finally, 
tary notion of 


maze (55) does not 
10, 37). Data relevan 


criteria, therefore, are inconsistent 
for both discrimination and maze 
contexts. . 

The predominantly positive corre- 
lations or lack of correlation between 
various VTE measures and errors 
for individual animals during the first 
100 and for all trials for Wischner’s 
three groups did not corroborate the 
r of —.65 reported by Tolman and 
Ritchie (55). Both in the maze (10) 
and in discrimination situations (30, 
59, 60, 61) regardless of the point 
in the learning sequence, the occur- 
rence of VTE behavior on a given 
trial tended to be accompanied by 4 
correct choice. i 

The general pattern of inconsist- 

encies within situations and amoan 
criteria requires either the furthe 
restriction of the learning efficiency 
hypothesis or the development © 
a more satisfactory explanatory 
scheme. Following the first course 
that of further restriction of the 
hypothesis, it can be noted that the 
contradictory data for discrimination 
learning were obtained in apparatu 
other than the Lashley jumping- 
stand. Therefore, it could be postu- 
lated that data favorable to the hY 
pothesis are more likely to be obtain?” 
in jumping-stands. In this ge 
owever, it would be necessary. 
explain Muenzinger's corroborative 
data for other situations and, mo 
importantly, the pertinent spec! 
eatures of the jumping-stand. 

“he mazes in which contradictory 
evidence was obtained differed fro™ 
those in which parallel VTE a” 
error curves were obtained. Here 
too, the hypothesis might be re 
formulated by the introduction K 
an, as yet, unspecified situation 
Parameter(s). y 

Data or inferences from data A 
ported by Tolman (49), Lane (2 9 
and, Particularly, by Wischner (5% 
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he suggest an interpretation of 
on orative evidence which does 
ae a the hypothesis that VTE- 
©: aol catalytic function in dis- 
so n ion tasks. These data con- 
alia relationship of position pref- 
See mig VTEing, particularly, for 
suggest oimai Specifically, it is 
Draien ee that strong initial position 
of VIE prevent the occurrence 
Phe cing during the initial phase 
area N training. Later, as 
plus ce to go to positive stimulus 
negativ, or right side, and to go to 
side a e stimulus plus left or right 
n more nearly equal, initial 
determi preferences may continue to 
Strep fa the side first faced. How- 
that sid the negative stimulus is on 
ational e, turns toward the positive 
= us should occur with greater 
quency. With each occurrence an 
patiled fe would be scored accom- 

This y a correct response. 
Ts: en saat ings suggests that any 
stand i such as the, jumping- 
Possibl hard discriminations, and 
betvesn vey: large spatial | angles 
ikely n discriminanda, which are 
ae to occasion strong position 
initial oot will also be marked, by 
oe Periods of infrequent VTEing. 
the as animals begin to respond to 
FS ge i cue with greater than 
hes A frequency, VTE's are likely 
Dositi he negative stimulus 1s On the 

I ion preference side. 

teste anl be recalled that for Wisch- 
wron ata it was found that shock- 
intero, and, with shock-avoidance 
5 sie ica as the “correct” response, 
When -right animals tended to VTE 
inte my negative cue was faced 
ated i such facings were not re- 
an coat Position preferences. Thus, 
Dete anation based on position 
a also requires additional 
TAN Eschewing detailed devel- 
ere, it is suggested that 


analyses of conflict (2, 6, 28), particu- 
larly of the approach-avoidance type, 
approximate the requisite principles. 

To conclude, the learning efficiency 
hypothesis seems neither generally 
applicable nor systematically power- 
ful. Instead, the particular forms of 
VTE and error relationships can 
probably be more profitably con- 
ceived as dependent on combinations 
of conditions including position pref- 
erences and conflict. Theoretical 
elaboration of these relationships, 
however, is a matter for future con- 
sideration. 


SUMMARY 


Three aspects of empirical data on 
VTE behavior have been considered: 
(a) criteria for VTE's (b) antecedents 
to and response correlates of VTE’s, 
and (c) VTE’s and learning efficiency. 

Two criteria for scoring VTE be- 
havior have been employed, VTE 
units which involve counting the 
number of times Ss have faced or 
looked at the sides or stimuli of 
choice situations and the VTE trial 
defined as any trial during which one 
or more VTE units were scored. After 
noting the lack of standard criteria 
for VTE and the relative lack of in- 
formation concerning intercriteria 
correlations under various conditions, 
it was suggested that one form of the 
VTE unit, the AB unit, should be 
recorded in all studies. 

Relationships between VTE and 
errors in discrimination and maze 
learning were summarized and evalu- 
ated before consideration of VTE 


behavior as a function of stimulus 


characteristics, motivation and con- 
flict variables, position preferences, 
organic, and cage-rearing conditions. 
VTE behavior was then related to 
length of entrance into the cul-de-sac, 
correct responses, hesitation time, 
initial facing or turning responses, 
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position preferences, and other re- 
sponse measures. 

The Muenzinger and Tolman hy- 
pothesis that VTEing aids learning 
at least in discrimination situations 
was examined. Four criteria for the 
hypothesis were specified and ex- 
amined in connection with given 
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respect to VTE and error relation- 
ships 


in either discrimination oF 


maze situations raised doubts con- 
cerning the learning efficiency ni 
pothesis. An alternative explanatigh 
based on an analysis of the role £ 
position preferences supplemented H 
principles of conflict was then 2! 


studies. In terms of these criteria vanced. 
the lack of consistent findings with 
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By pe are at present current 
a : students with respect to 
a ar’ conditioning. (a) Back- 
ard conditioning is a genuine vari- 
a of conditioning but such condi- 
E is not readily obtainable nor 
a pronounced, and in the course 
Neneh CR training tends first to 
io ee and then to disappear and 
Se rag inhibitory characteristics 
Nn asi revised view, 18, P. 393; 
i pe view—backward conditioning 
an possibi; ibid. p- 27)2 ) Back- 
end conditioning is a genuine phe- 
ioe non but its very special char- 
co i particularly that of di- 
eth ning through training, hardly 
atau it to be classed as true con- 
(c) on, ponner and Kellogg, 36). 
artifa EN ward conditioning is an 
CS-US . resulting not from the 
Senati Danbigs but from the US- 
iy or pseudo-conditioning 
Gases = 4; Grether, 9; Harris, ii; 
bar od, 17; Woodworth and Schlos- 
g, 42). 

ano orcunarely, however, each of 
a thant three views is based only upon 
oe ited segment of “its own, evi- 
ma third view isin particular 
Bice ine with total evidence. And all 
side anly are badly in need of recon- 
‘ie ration in the light of: (a) Russian 
fo menis that appeared after Pav- 
ty two books were published (1927 
à 1928); (b) Russian experiments 
ae not considered by Pavlov 
We a were not performed in his 
aboratory but in the labora- 


1 

Se often-quoted statement of Pavlov that 

cae conditioning “is insignificant and 

tel'ny cent" (19, p. 381; Russian: “neznachi- 

p.92) m skoroperekhodyashchiy,” (21, Vol. 3, 

meet be translated as meaning “of small 
gnitude and short-lived.” 


tories of Beritov, Bekhterev, and 
Ivanov-Smolensky; and (c) the fact 
that by far most of the American 
experiments on backward condition- 
ing used human Ss and readily re- 
portable and centrally controllable 
Rs which could hardly make their 
data alone definitive with respect to 
conditioning in general. Moreover, 
it should be pointed out here that 
almost all discussions of backward 
conditioning seem to have overlooked 
the very essence of the problem; 
namely, the consideration that even 
unstable and temporary backward 
CR’s are of great theoretical sig- 
nificance inasmuch as they obviously 
cast grave doubt upon any CR 
theory that makes “expectancy” the 
sine qua non to the formation of 
conditioning (to some extent they 
also reflect adversely upon “rein- 
forcement” theories) and strongly 
advance “contiguity” and ‘‘con- 
tiguity plus” views. 
The present review will thus at- 
tempt to offer both a critical evalua- 
tion and a theoretical integration of 
all the experimental evidence on 
backward conditioning. This experi- 
mental evidence consists, to date, of: 
(a) 13 experiments from Pavlov’s 
laboratories (all with dogs)? (b) 4 
experiments from other Russian labo- 
ratories (two animal and two hu- 
man); and (c) 13 American esperi | 


2 Only one (13) of the thirteen experiments 
from Pavlov’s laboratories is discussed in some 
detail in Pavlov's English texts (18, 19). Two 
other experiments are mentioned only by 
name without any detail (they apparently 
had not been completed at the time of the 
writing of the texts), while the remaining ten 
are not mentioned at all (nine of the ten were 
performed after the texts were published). 
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ments (nine human and four animal). 
All the experiments from Pavlov’s 
laboratories (some performed after 
his death) used dogs as Ss and, with 
the exception of one study in which 
an electric shock was the US, all 
were of classical design: that is, 
feeding as the US, salivation as the 
quantitative CR, and records of the 
animal’s gross behavior as a general 
index of change. The “other Russian 
experiments’ are: (a) two by Beritov, 
one on dogs and one on decorticated 
pigeons; (b) one by Shnirman from 
Bekhterey’s laboratory with finger- 
withdrawal from electric shock in 
adult human Ss; and (c) one by 
Pressman from Ivanov-Smolensky's 
laboratory with “food-obtaining,” 
“view-obtaining,” and “verbal” tech- 
niques in school children, Finally, 
the American experiments comprise: 
(a) two early studies with white 
rats, (b) five experiments with the 
eyelid reaction in college students; 
(c) four experiments with finger- 
withdrawal from shock in college 


students; and (d) miscellaneous stud- 
les: one with four 


* The two experiments by Nagat: 
y (J. exp, 
Psychol. 1951, 42, 239-246; 333-340 
included here. Lea 


S upon Previously 
s (instrumental CR'’s 
d only in a forward 
lve, in 


A definition of backward condi- 
tioning would also seem to be in 
order. And to a large extent the de- 
finition is rather simple. Backward 
conditioning is conditioning in which 
the conditional stimulus is activated 
a short time after the unconditional 
stimulus. The activation of the con- 
ditioned stimulus may furthermore 
occur either (a) a short time after the 
cessation of the action of the uncon- 
ditioned stimulus or (b) a short time 
after the beginning of such action. 
In the latter case, however, one 
must be warned against confusing 
backward conditioning with ‘‘cessa- 
tion conditioning” in which the con- 
ditional stimulus is activated a short 
time before the end of the action of 
the unconditional stimulus? (44) 
and is intended to produce not 4 
conditional evocation but a condi- 
tional cessation of a reaction. Again, 
it should be noted that backward 
conditioning cannot refer to operant 
conditioning since in operant condi- 
tioning the conditional reaction, by 
definition, produces the uncondi- 
tional one and obviously precedes it. 
Finally, it may need to be mentione 
that backward conditioning shoul 
not be a priori bracketed with ‘back- 
ward association” in verbal learn- 


'g, to which it may or not be related. 


EXPERIMENTS FROM Paviov’s 
LABORATORY 


_ The Krestounikov experiment. ThE 
is the classical experiment, oe 
formed by Krestoynikov in 1913° an 


* Psychologists who think of conditioning Sy 
effected between reactions rather than, b 
tween stimuli will probably want to substitu : 
“reaction” for “stimulus” in this and in te 
Preceding two sentences. Such psychologists 
may regard some experiments which are 1 
cluded here as in reality cases of forwat' 
Conditioning. 

ê Successful backward conditioning wa 
reported in Pavloy’s laboratory earlier 
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published in detail in 1921 (13). The 
experiment was carried out in Pav- 
lov’s new laboratory—E in separate 
room, sound-proof animal room, au- 
tomatic administration of uncondi- 
tional and conditional stimuli, and 
capillary recording of salivation—and 
It was rather extensive, involving four 
to six months of experimentation 
with each one of five dogs. Moreover, 
all the five dogs had previous well- 


. developed forward CR’s—two dogs 


to the sight of a whirligig, one to a 
tactile stimulus, one to the odor of 
camphor, and one to the odor of 
amyl acetate. The backward CS’s 
Were as follows: for the two dogs with 
the whirligig CR’s—a metronome of 
b beats per minute and a “Ioud” 
ell in one case, and the odor of 
por and a “mild” faradic shock 
n the other; for the dog with the 
tactile CR—a metronome of 100 
ra per minute and a “loud” bell; 
a the dog with the camphor CR— 
uning fork of 3360~ anda “loud” 
on and for the dog with the amyl 
tate CR—the odor of vanillin. 
a backward delay—i.e., the time 
of erval between the administration 
the food (or 0.1% of HCI) and the 
application of the CS's—was 5 to 10 
Sa in four dogs, and 5-10 and 2-3 

ec. in the fifth animal. 
ee Pavlov (18, p- 27) and Kres- 
ex nikov definitely state that the 
Sc nent showed absolutely no 
We acy of the formation of a back- 
tise CR. However, a detailed per- 
et of the results leads to 4 less 
tegorical conclusion. True, two of 
ee ee aes 
pimeney (dissertation, 1907). However, 
lov's ala s experiment was performed in Pav- 
Plete so Bboratory and his data are incom- 
hat not much significance should be 


at 
‘tached to his findings. Pavlov himself men- 


ìoned Pi * Sag al 
of h Pimenev’s results only in passing in one 


156). Wednesday seminars (20, Vol. 1, P- 


the five animals failed to reveal any 
signs of a backward CR even after 
several hundred backward reinforce- 
ments, and one of the two had his 
backward CS in the same modality 
as his forward CS—that is, his for- 
ward CR was to the odor of amyl 
acetate and he could not develop a 
backward CR to the odor of vanillin 
after 427 backward reinforcements 
in three and a half months of experi- 
mentation (Pavlov’s laboratory sel- 
dom uses more than half a dozen 
reinforcements per day). Moreover, 
in these two animals, the hundreds 
of backward reinforcements did not 
seem to have any facilitatory or 
inhibitory effects upon subsequent 
formations of forward CR’s, nor any 
facilitatory or inhibitory effects when 
the backward CS’s were applied 
simultaneously or in close succession 
of CS's of the previously developed 
foward CR's. Yet, in the three other 
dogs, the protocols of the experi- 
ment do show backward condition- 
ing, even though the conditioning 
might be regarded as unstable, 
sporadic, and of small magnitude. 
In two of the three dogs, the course of 
the development of the backward 
conditioning, interestingly, was quite 
similar to the one reported by 
Spooner and Kellogg (36) 34 years 
later: evident in the early stages of 
the training but diminishing and 
disappearing in later stages. But in 
the third of the three dogs, the de- 
velopmental course was V-shaped: 
small in magnitude in early stages, 
absent in the middle stages, and re- 
appearing in somewhat larger magni- 
tudes in the last stage of training. 
Nonetheless, Pavlov himself was 
apparently not fully convinced of the 
finality of Krestovnikoy’s negative 
results, as we note that in the early 
twenties he set a number of his 
students to reinvestigate the entire 
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problem. What disturbed Pavlov 
was evidently not so much the im- 
possibility of forming backward CR’s 
as the reported fact that the back- 
ward CS’s remained indifferent stim- 
uli despite several hundred admin- 
istrations while, according to Pav- 
lovian “cortical dynamics,” stimuli 
which do not become conditioned 
quickly acquire inhibitory tenden- 
cies. Hence, the task assigned to the 
students was to study “what hap- 
pens in the cortex during backward 
conditioning” with an underlying 
objective of looking for the develop- 
ment of inhibition. 

Experiments reported in 1927-1928. 
Anokhin (1) was the first to restudy 
backward conditioning but only in 
an exploratory manner. He used 
only 12 backward trials in one dog 
and was concerned not with the for. 
mation of backward CR’s but with 
the aftereffects of backward com- 
binations upon previously formed 
forward CR's to the same and to 
different stimuli. The forward CR’s 
were to a light, a metronome, and a 
bell; and at first the light, the weakest 
CS, and then the bell, the strongest 
one, were applied once a day at a 
backward delay of 2 sec. instead of 
the usual 30-sec, forward delay. 
However, while the five backward 
light combinations reduced greatly 
the forward CR to the light and re- 
duced considerably the CR’s to the 
two other CS's, the seven backward 
bell combinations had no such ef- 
fects. Anokhin explains his results 
by stating that the light being a 
weaker CS naturally induced a 
greater amount of inhibition, an 
explanation that obviously does not 
go too far. 

Podkopayev (24) performed a 
much more extensive experiment 
lasting several months. He used two 
dogs, and studied primarily the for- 


mation of backward CR’s and only 
secondarily the aftereffects of such 
formations. His backward delay was 
2 sec. and his backward CS’s were ‘ 
light in one animal and a therma 
stimulus of 1°C. in the other. But 
here, too, the results were not uni- 
form. No backward CR was formed 
to the light after a large number 
backward trials but a backward cl 
of small magnitude was formed to 
the thermal stimulus after only a few 
backward trials. Moreover, the back- 
ward light combinations are reporte 
to have inhibited the subsequent 
formation of a forward CR to ua 
light, while the backward thermal 
stimulus combinations are said tO 
have facilitated the subsequent for 
mation of a forward CR to the gai 
stimulus. Podkopayev’s data on t 2 
facilitation and inhibition are not — 
convincing since no direct eer 
is available on what the course of t ws 
forward conditioning would oe 
been without the preceding bac i 
ward trials. But his results on tf 
formation of the backward í 
themselves seem to be quite cleat 
cut. ine 
Rite (32) introduced the method? 
logical modification of forming fire 
a forward CR to some CS, then “Te 
versing the order” and studyiné 
backward experimentation with w 
same CS, and then trying the a 
ward order again. She worked wit 
two dogs, using a backward delay K 
2-3 sec. and CS’s of a light and a be é 
in one dog and of a light and a any 
ened tuning fork in the other. i 
results are here somewhat more ae 
sistent. In no case did the backw4! 
combinations, ranging from 49 to a 
abolish completely the forward C M 
to the same CS’s. But the mag” $ 
tudes of the forward CR’s were x 
duced considerably—from one e 
to nine-tenths of initial values 
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san hl the four cases, while in the 
Sate ope dampened tuning 
a ‘oe! CR remained practically 
the re ed. However, unlike those in 
the hel ee gid cited experiments, 
shear « cn combinations did not 
Ward a ON formations of for- 
Sol U's to the same CS's. 
ie hi cps (35) worked with two 
Rite a the same methodology as 
ae i ut backward delays of 5 sec., 
en periods of total experi- 
ward a He found that 82 back- 
eed geo abolished completely a 
Such trial R to a light and that 53 
noise he s reduced a CR to a weak 
tioned ea 50 to 3 units of condi- 
ever i iva per half-minute. How- 
in nia orward CR to a loud noise 
135 eine dog was unaffected by 
in i ge reinforcements, while 
about he dog 291 backward trials 
tion) y hree months of experimenta- 
Ward CR i needed to reduce a for- 
of conditi to a bell from 26 to 4 units 
Tri erga saliva per half-minute. 
ormed ort, the experiments per- 
the Kre se va early twenties, unlike 
aan E study, showed that 
th iawn CR’s, even though small 
met one and unstable, could be 
ie under certain circumstances. 
com oe, ERE CR's were by all 
iim a ae CR phenomena and not 
they e of pseudo-conditiomng since 
that on. in no case elicited by stimuli 
me not been paired with the US's. 
at à other hand, the Russian claim 
Sume backward CR’s specifically as- 
sire m their formation inhibitory 
out es has not been borne 
pa the reviewer's opinion, by the 
PA ae so far cited. Such char- 
i wale ace manifested themselves aS 
ed only after a large number 0 
is fo combinations and the fact 
ward r even a large number of for- 
d reinforcements tend to reduce 
s and produce so-called “extinc- 
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tion through reinforcements" (Rus- 
iy “ Ugasheniye s podkrepleniyem,” 
_ Experiments reported in 1933. 
Kreps (12) worked with one dog using 
a backward delay of 2-3 sec. and 
backward CS’s of a thermal stimulus 
of 45°C. and of the odor of camphor. 
At first, no CR was in evidence after 
100 backward reinforcements in 40 
consecutive experimental days. But 
when the experiment was interrupted 
for a week and a number of the dog’s 
previous forward CR's were reacti- 
vated to raise the level of “cortical 
excitability,” a backward CR to both 
CS'’s was formed in 5-6 backward 
trials. The CR’s were, however, 
quite unstable, and Kreps seems to 
have been the first E to show clearly 
that the backward CR’s in his dog 
retarded substantially subsequent 
formations of forward CR’s to the 
same stimuli. 

Pavlova (22) worked with three 
dogs studying the effects of backward 
conditioning upon both the acquisi- 
tion and the extinction of forward 
CR’s. She used backward delays of 
“several seconds” and three separate 
procedures with appropriate controls 
to study their effects. (a) Backward 
combinations were rotated with for- 
ward combinations in the process of 
conditioning the dogs to a light and 
a whistle while only forward com- 
binations were used in conditioning 
them to a tactile stimulus and to the 
bubbling of water. (b) Reinforced 
backward trials were substituted for 
nonreinforced trials in a partial-rein- 
forcement technique in which a light 
as a CS was reinforced every third 
trial in a forward direction. (c) The 
extinction of a CR to a bell, in which 
reinforced backward trials were inter- 
spersed, was compared with regular 
extinction of a CR toa hissing sound. 
(The CR strengths of the experi- 
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mental and the control CS’s as other- 
wise determined in the laboratory are 
said to have been equal in (a) and in 
(c).) Pavlova’s data leave no doubt 
that, while the backward CR trials 
were by no means as effective as for- 
ward ones in producing conditioning, 
they certainly should be classed as 
positive CR variables inasmuch as 
they all aided considerably both the 
development of forward conditioning 
and the resistance of the condition- 
ing to extinction. 

Vinogradov (39) worked with two 
dogs that had no previous forward 
CR's. He used a backward delay of 
15 sec., a combined US-CS activation 
of 15 sec., and a light, a metronome, 
a noise, a bell, and a tuning fork as 
CS's. Backward CR’s were formed in 
all cases and they all were of consider- 
able magnitude. However, they did 
develop more slowly and tended to 
diminish upon repeated reinforce- 
ments more than forward CR’s. 
Thus, a backward CR to a metro- 
nome of 100 beats per minute first 
appeared after 12 reinforcements at a 
magnitude of 5 units of conditioned 
saliva per 30 sec. and continued 
to increase in magnitude until after 
67 reinforcements it equalled a maxi- 
mum of 30 units per 30 sec. But then 
it began decreasing and finally dis- 
appeared completely after 135 rein- 
forcements. (Forward CR’s to metro- 
nomes usually appear after 4-8 rein- 
forcements, reach magnitudes of 50- 
60 units, and are likely to diminish 
only after several hundred reinforce- 
ments.) 

Petrova (23) performed an exten- 
sive experiment with three dogs which 
had a number of Previously formed 
torward CR’s. She used a backward 
delay of 5 sec., a US-CS combination 
of 15 sec., and backward CS's ofa 
hissing sound and of a rising white 
figure on a black background, Back- 
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ward CR’s of small magnitude were — 
formed in all cases but, as in Vino- 2 
gradov's study, they, too, invariably 
began to diminish in the course S 
training and finally disappeared- 
However, Petrova continued to ad- 
minister a large number of backward 
reinforcements after their CR’s dis- 
appeared and found that these rein- 
forcements, as well as nonreinforce 
applications of backward CS's, on 
to affect negatively not only tne 
magnitudes of the animals’ forwar 
CR’s but also the animals’ total be- 
havior. “Neurotic” disturbances; 
“hypnotization,” and ‘‘paradoxic@ 
states'’’"—strong stimuli producing 
week effects and weak stimuli strong 
effects—were produced in two of the 
three experimental animals. in 
Thus, the experiments reported a 
1933 have brought out that (a) bac* 
ward CR's could certainly be tome 
more readily than it was thoug” 
earlier, and that (b) the magnitudes 
of backward CR's may be compara’ 
ble to those of forward CR's, but tha 
(c) the backward CR’s tend to dima 
ish and disappear in the course A 
training much more commonly pee 
do forward CR's. Again, the 19 4 
experiments indicate that the allege’ 
inhibitory properties of backw@" 
CS's appear only after the backwar 
CS's cease evoking CR’s, and disclos 
the interesting finding that backwat 
CR’s are more readily formed W! 3 
delays of 12-15 sec. than with t om 
of 5 sec. and more with the latt® 
than with delays of 2-3 sec. J 
Experiments reported in 1940. Ne 
hdanova (16) experimented W? 
weak US's, the administration 
0.1% of HCI which 
strengthened to 0.3%. k 
ward delay was 3-5 sec., the bacy 
ward CS was the bubbling of wate’ 
and the US-CS combination last®, 
10 sec. Five trials were made €4° 
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day: four reinforced training trials 
a one nonreinforced test trial, and 
the intertrial intervals were always 5 
a mne backward CR appeared 
pler the twelfth reinforcement and 
in ae to increase in magnitude 
: guar fashion. There was no de- 
bone in the backward CR after 445 
oS aml and in all respects it 
> ae a regular well-established 
t p oe (37) studied the effects 
eng forward CR's into 
ae ones in two dogs. He used 
Ae kward delay of 2-3 sec., bub- 
ene” asa backward CS, and 
each Pheu the effects differed with 
ward In one animal, the: back- 
so eae diminished the 
Bice R while in the other they 
AA it. Stroganov explains his 
a at terms of Pavlovian typol- 
Meenin explanation that has but little 
Used ; g when only two animals were 
in the experiment. 
Pa o tovish (15), studied back- 
Which aren in two dogs in 
shock © mild “unavoidable” electric 
po the US and a bell was the 
al e used two separate experi- 
4 Bak | Procedures. In one dog, the 
k lasted for 2-3 sec. and the bell 


Ment, 


th 
Sec, 
bell 
Oth 


Or 

cements; with the second pro- 

» No evidence of any CR was 

setter 56 reinforcements. 

have thus added tw reported in 1940 

to the i added two significant facts 
nowledge of backward condi- 


Noted 


tioning; (a) the fact that an undimin- 
ishing backward CR of high magni- 
tude could be formed under certain 
circumstances; (b) the fact that with 
electric shock as the US, a backward 
CR may be formed when the CS is 
applied immediately after the cessa- 
tion of the US but not when it is ap- 
plied during the action of the US. 


OTHER RUSSIAN EXPERIMENTS 


Beritov's experiments. Beritov ex- 
perimented with two dogs (2) and 
with a decorticated pigeon (3). He 
used “unavoidable” electric shock as 
the US, and CS's of the sound of a 
metronome and of an organ pipe, 
and the flashing of a light. Heseems 
to have been most successful with 
the decorticated pigeon in which a 
stable withdrawal CR was estab- 
lished by flashing the light 2-3 sec. 
after the cessation of the application 
of theshock. Buta backward CR was 
also formed in one dog by sounding 
a tone of 512 on the organ pipe im- 
mediately or 1-3 sec. after the cessa- 
tion of an electric shock of 2-3 dura- 
tion to the animal's right forepaw. 
This CR developed quickly after 5 
reinforcements but did not attain 
stability even after 100 reinforce- 
ments and was extinguishable in 2-3 
nonreinforced trials. However, both 
Beritov’s dogs failed to show any 
evidence of backward conditioning 
when (a) the shock lasted 30 sec. and 
a metronome was sounded 3-10 sec. 
after the beginning of the shock; and 
(b) a light was flashed 10-20 sec. 
after the cessation of a strong shock. 
Beritov's results thus corroborate 
those of Narbutovich (supra) that 
with shock as the US, backward CR’s 
may be formed when the CS is ap- 
plied after the cessation of the US 
but not when it is applied during the 
action of the latter. 

Shnirman's experiments. The origi- 
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nal report of Shnirman’s experiment 
was, unfortunately, unavailable to 
the reviewer who thus could only re- 
state Russian summaries that ‘‘Shnir- 
man worked with a large number of 
adult human Ss and found backward 
conditioning possible when the sound 
of a bell was applied 2-3 sec. after an 
electric shock as the US” (2, 15). 

Pressman’s study. Pressman (26) 
attempted backward conditioning in 
14 school children—7-12 years of age 
—with the use of four different tech- 
niques: (a) finger-withdrawal from 
faradic shock; (b) so-called ‘“Ivanov- 
Smolensky food-obtaining technique” 
in which child presses a rubber bulb 
to obtain some food, usually candy; 
(c) so-called “Ivanov-Smolensky 
orientating or view-obtaining tech- 
nique” in which the child presses the 
bulb to view some scenery; and (d)a 
“verbal technique” in which the child 
presses the bulb in response to E's 
command. The CS’s were in all cases 
auditory stimuli—a bell, a crackling 
noise, and tones C and H ona Horn- 
bostel variator—and were always ap- 
plied immediately after the cessa- 
tion of the US—pressing the bulb or 
electric shock. The finger-withdrawal 
experiment lasted only for two ses- 
sions in which only 40 reinforcements 
were made, and its negative results 
could not be too conclusive. How- 
ever, the three other experiments con- 
tinued for hundreds of trials, and yet 
in no Case was a stable backward CR 
established, while in most cases it was 
not formed at all. The children’s 
spontaneous verbal reports of “What 
is the bell for?” “Say please what is 
the sense of the bell?” etc, are also of 
interest. 


AMERICAN EXPERIMENTS 


Early experiments with white rats. 
The first American study of what 
would now be called backward con- 


GREGORY RAZRAN 


ditioning was performed by Carr and 
Freeman (5) with nine white rats. 
The rats failed to learn, in 1,500 
trials, to “turn around and retrace 4 
path” at the sound of a buzzer wh 
the buzzer had been sounded ae 
time approximately 1 sec. after t É 
door of the path leading to food wa 
closed, while two other groups of m : 
learned the association when “a 
buzzer was sounded simultaneous y 
with the closing of the door or a 
proximately 1 sec. before the ne 
was closed (the performance of a 
“simultaneous” group being infer 
though, to that of the “1 sec. before 
group). Yarbrough (43) modified ty 
Carr and Freeman study by com a 
ing an electric shock with the baer 
which in a large way gave rise to ai. 
a different learning task. His 38 a 
first had to learn to turn around 4 4 
retrace their pathway upon the 4 

ministration of a shock, and then y 
do the same at the sound of a hora 
which preceded, followed, or Was yok 
plied simultaneously with the ae 
Two “backward” conditions—bur p 
following immediately the cae 
of the shock and following it 

after—four “forward” conditio” 


6 se 
buzzer 1 sec., 2 sec., 4 sec., and imul 
before the shock—and one i with 
taneous” condition were use¢ 38 


seven different subgroups of men g 
rats. And Yarbrough’s results and 
quite different from those of Car" up’ 
Freeman. Both “backward” ge i 
mastered the shock-buzzer asao i 
tion quite well, The “immedi ot 
after” group was almost as € (he 
as the “1 sec. before” group a7” 438 
“1 sec. after” group was abo ge" 
per cent as efficient as the “1 f 
before” group. f 
a e A In Cason’s er 
experiment (7), one S was allo“ r 
to wink “naturally” for five k a 
during which “several thous 
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winks must have been made,” and 
these winks were connected with a 
telegraph sounder which clicked 
when the natural wink was practi- 
fally completed.” Yet, 40 trials with 
the click alone did not produce any 
; Wink reaction. Cason duplicated 
A S experiment 13 years later with 8 
A D during two to four hours pro- 
ND a total of 17,727 natural winks 
Siar TE tested 304 times with the 
eli nd alone. But only 32 winks were 
a in the 304 tests and 21 of the 
had Preno in the 2 Ss... who 
Tates E he fastest natural winking 

! "© (8, p. 605). 
gain, with “regular” CR pro- 
A ee Bernstein's (4) experiment 
Ward , only a small amount of back- 
ie conditioning, Switzer's (38) dis- 
ies a large ‘amount of it, while 
no r (25) maintains that he found 
ee ard conditioning at all. 
ein used a mild shock as the 
intery niek as the CS, and backward 
Used a > of .5 and .9 sec. Switzer 
Sone: mechanical tap as the US, a 
i tervals as the CS, and backward in- 
a puff of -5-2.0 sec. And Porter used 
of uy Ol air as the US, a brief flash 
Eas t as the CS, and backward in- 
Ma 47 and .98 sec. Bernstein 
Positiy ed that some of Switzer s 
“facile, results might be due to 
Whi 'tation” (pseudo-conditioning) 
But it is 


Very, _ Vay well be true. 

day, “likely that all of Switzer's 
ins be accounted for in this 
of St The fact is that the status 


Cou ackward eyelid conditioning 
Settleg hardly indeed be regarded as 
pe out © one of the three Es (leav- 
Case) the Cason study as a special 
Mor continued his experiment for 
Of 6 than one session—maximum 
duit h S-SC pairings—and all used 
t Sider, man Ss and overlooked the 
- tonin, ction that backward condi- 
8 may require special circum- 


stances to be obtained and main- 
tained and still be a genuine CR 
phenomenon. The entire problem is 
in need of reinvestigation on a larger 
scale and with a fresher outlook. 
Finger-withdrawal from shock. 
Wolfle’s two experiments (40, 41) 
with ‘avoidable’ shock are well 
known. She used in her first experi- 
ment 10 Ss with a backward interval 
of .25 sec. and 10 with one of .5 sec., 
while the backward intervals in her 
second experiment were .2, .6, 1.0, 
and 2.0 sec. used with 24 Ss divided 
into groups of 5-7 Ss each. In the 
first experiment, the backward inter- 
vals of .25 and .5 sec. yielded respec- 
tively 10 and 13 per cent of condi- 
tioning as compared with 29 and 37 
per cent obtained with forward inter- 
vals of the same duration. In the 
second experiment, the backward in- 
tervals of 1.0 sec. actually produced 
more conditioning than the forward 
intervals of 1.0 sec.—11 vs. 7 per 
cent—while backward intervals of .2 
and .6 sec. produced only 7 and 10 
per cent conditioning as compared 
with 52 and 37 per cent for forward 
intervals of the same duration. There 
is no doubt that Wolfle’s experiments 
show backward conditioning, and the 
reviewer is by no means ready to 
attribute it all to pseudo-condition- 
ing. However, larger-scale experi- 
mentation is needed here even more 
than in the case of the eyelid condi- 
tioning, since the conditioning of 
finger-withdrawal from shock is, oF it 
known, notoriously variable (28). 


r of trials needed to form a 
shock-withdrawal CR in 21 bear ae 
three Russian experiments was noted by 

reviewer to range from 7 to 1,340. The pean 
was 213 and the SD was 255.1. American Bs 
commonly report shock-conditionings in 
human Ss in terms of percentages of Ss con- 
ditioned in one session, and as a rule do not 
pursue the study of the CR processes 1 the 


6 The numbe! 
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Harris (11) used 13 Ss who ‘‘were 
acquainted with the essential facts 
of conditioning,” a “loud tone” of 
4.75 sec. duration which followed im- 
mediately a shock of .25 duration as 
the CS, and one experimental session 
of 80 US-CS pairings. A certain 
amount of backward conditioning 
was obtained which the reviewer, 
unlike Harris, believes could not 
justifiably be attributed “almost 
completely” to nonassociative factors 
or pseudo-conditioning by Harris’ 
own data. 

Spooner and Kellogg (36) no doubt 
performed the best controlled and 
the most extensive backward CR ex- 
periment in this country. They used 
Ss who were “naive with respect to a 
knowledge of the psychology of learn- 
‘ing and of the nature of condition- 
ing,” an “unavoidable” but ad- 
justed shock intensity which always 
produced a 6-in. withdrawal move- 
ment, complete polygraph recordings 
of not only stimuli and responses but 
also of latencies, and an experimental 
session which lasted for two hours 
and was divided into five “experi- 
mentation blocks” separated by 5- 
min. rest-periods. The backward in- 
tervals were .25 sec. with one group 
of 10 Ss and .5 sec. with another 10 
Ss, and their results certainly war- 
rant their conclusion that “backward 
conditioning exists and that it must 
be accepted as an established fact” 
(ibid., p. 328). On the other hand, 
the reviewer does not concur with the 
Spooner and Kellogg statement that 
“backward conditioning is apparently 
an entirely different phenomenon from 
forward conditioning" (ibid.; ital. in 
text). The statement seems to be 
based primarily upon findings that 


nonconditioned Ss. Russian Es have been 
known to continue their shock- 


| Conditionin: 
in human Ss for several months, 5 


the backward CR’s had 
latencies than the forward CI | 
mean of .280 sec. for backward inter- 
vals of .5 sec. vs. mean of .491 for 
forward intervals of the same length | 
—and that the backward CR dimin- 
ished in frequency in the course of 
usual CR training. However, neither 
of these would seem to justify divest- 
ing backward conditioning of it 
rightful name. Latencies of CR at^ 
functions of CS-US asynchronis™ | 
and there is no reason why radically 
different asynchronisms should nok 
produce radically different latencies! 
The fact is that the mean latency 7 
the CR’s with forward intervals % it 
sec. was .750 which means that ! 
differed more from the mean latency) 
of the forward CR with .5-sec. inter | 
vals than the latter differed from a 
mean latency of backward CR’s wit? 
-5-sec. intervals. And with respect n 
CR diminution with training, it W! 
be remembered that the backwat” 
CR’s in the early Russian expel” 
ments also manifested such dimin 
tion but that later adjustments 
lengths of backward delays # e 
strengths of CS’s and US's proved le. 
diminution to be not an invarl@™ 


aan itio” 
characteristic of backward conditio 


ing.” | 
Miscellaneous experiments. Grethe 
(9) established a backward “ene 
tional” CR in two monkeys by uae 
an explosion of flash powder “ney, 
“snake blowout” as the US and 4 nil 
as the CS; but then found that tbe 
two control monkeys form “tio” j 
same CR through pseudo-cond! v ed 
ing. Similarly, Harlow (10) shored 
pseudo-conditioning to be as ® © 


x trg 

7 Through an oversight, the experime? iy 

Fitzwater and Reisman missed beira i 

cluded in the present review. Fitzwat© Ly 

Reisman used avoidable finger-withdra ev 

ten college students, and found litt J 
dence for backward conditioning. 
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tive as, if not more effective than, 
either backward or forward condi- 
tioning in four goldfish when a strong 
shock was used as the US and a mild 
shock as the CS. Both these experi- 
ments are of prime significance in 
focusing the need for ‘‘pseudo-condi- 
? tioning controls” which Pavlov’s 
A laboratory recognized years ago. But 
they obviously shed no light on the 
_-4Problem of the specific existence and 
i “Nature of backward conditioning. 
| Their stimuli and subjects—very 
A Strong US's in one case and animals 
hag in phyletic scale in the other— 
Jare indeed too special to warrant 
generalizing their data to other more 
i usual CR situations. Finally, it 
might be mentioned that Switzer 
(38) reported in a preliminary study 
Successful backward patellar condi- 
tioning in 2 of 5 Ss and failure of 
backward GSR conditioning in 3 
other Ss, and that Cason's early 
Pupillary conditioning (6) was really 
backward in essence, the bell follow- 
ing the beginning of the action of the 
light. 


e 


‘ INTEGRATIVE DISCUSSION 
AND THEORY 


_, The sum total of the reviewed ex- 
perimental studies leaves no doubt 
| a. backward conditioning is a 
in uine CR-associative phenomenon 
y at is obtainable and maintainable 
< Under special conditions. The posi- 
tive evidences for it are not only more 
» Numerous but, as revealed in the ex- 
J Periments of Vinogradov and Nezh- 

anova, clear-cut, otherwise unac- 
| Countable, and a product of months- 
à long observations on animal Ss. The 
_ Negative evidence, on the other hand, 
‘ys found in the experiments of Porter 
and of Bernstein comes from short 
9ne-session studies with adult human 
Ss that could hardly be regarded as 
€ither definitive or exemplary with 


respect to conditioning in general. 
Hence, the hypothesis advanced by a 
number of American experimenters 
and writers in this area, that back- 
ward conditioning is an artifact of 
pseudo-conditioning, might as well 
be rejected outright. The hypothesis 
would probably not have been offered 
if the essence of the reviewed Russian 
findings were known in this country. 
Moreover, a tenable argument can 
be made for not accepting the hy- 
pothesis even on the basis of avail- 
able American data. The best-con- 
trolled American experiment with 
human Ss, that of Spooner and Kel- 
logg, and the only American animal 
experiment in which standard ani- 
mals and standard CS's and US's 
were used, that of Yarbrough, do not 
support it. 

Nonetheless, it is also obvious in 
the cited studies that backward con- 
ditioning presents difficulties and 
requires special conditions for forma- 
tion and maintenance. As so far 
noted, the special conditions for for- 
mation appear in the main to be: (a) | 
a US that is not too strong (Nezh-) 
danova, 16) as well as a CS that is 
not too weak (Podkopayev, 24); (b) 
with food as the US, a US-CS delay 
of some length (5-15 sec.; Solovei- 
chik, 35; Vinogradov, 39); and (c) 
with shock as the US, applying th 
CS after the cessation and not durin 
the action of the US (Beritov, 3; 
Narbutovich, 15). More special 
conditions might of course be re- 
vealed in future experiments but the 
present ones certainly seem to be 
little in accord with either a CR 
theory of “drive reduction” or one of 
“contiguity” in which coincidence of 
UR and movement-produced stimuli 
is assumed, not to mention ‘‘cognitive 
and expectancy” CR theories which, 
as indicated earlier, are in general out 
of line with any backward condition- 
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ng. Indeed, about the only view 
which the special conditions of back- 
ward conditioning would fit is the 
one of ‘‘dominance-contiguity” which 
the reviewer suggested some years 
ago (28, 29) and which in essence is 
only a behavioral reconceptualiza- 
tion of the doctrines of Pavlov and 
of Ukhtomsky and in a number of 
respects not unrelated to Lashley’s 
concept of ‘‘what is associated” (14) 
and to F. Sheffield’s discussions of 
consummatory behavior and ‘‘drive 
induction.” According to this view, 
(a) conditioning requires not only a 
minimum of US and CS strength 
but also an optimum strength-ratio 
between the two: the US must be 
considerably stronger than the CS 
to dominate it but not so much 
tronger as to completely “ground” 
t; and (b) in CR situations, the 
lominance of a stimulus is a func- 
‘on not only of its intensive but also 
an its temporal characteristics: prece- 
= nce in time enhances dominance 
_ atly, while duration gradually 
| vers it. Hence, (a) the difficulties 
/ucountered in backward condition- 
may be conceived of as being due 
a US that is overdominant through 
mbining its own intensive domi- 
ance with the dominance of prece- 
` ence in time; and (b) conditioning 
hay be effected when the US is either 
mparatively weak to begin with or 
comes weakened through duration 
with food as the US) or cessation 
with shock as the US). (In Pav- 
ovian terms, it means that the US 
ust be strong enough to produce 
rradiation of excitation but not so 
strong as to produce negative induc- 
tion, while subjectively we may say 
that in backward conditioning S 
must not be so much preoccupied 
with the US that the CS is not at- 
tended to.) 
On the other hand, the diminish- 


ability of backward conditioning in 
the course of usual CR training, 
which while not an invariable charac- 
teristic (16) is still probably more 
common with backward than with 
forward CR’s, might be interpreted 
in two ways. First, it may be argues 
that the formation of backward CR'S 
of a CS-US direction is often paral- 
leled by a formation of ‘‘revers¢ 

forward CR's of a US-CS type—that 
is, the US’s become CS’s and the 
CS’s, US’s—and that these ‘‘revers¢ 
forward CR’s come to interfere with 
the backward conditioning. 
Beritov (3) and Bernstein (4, PP: 
192-194) clearly demonstrated that 
the shock stimuli (the US) in thet 
experiments came to evoke gouni 
(CS) reactions, and the many E 
stances of reduced UR’s in the bac 

ward conditionings of Pavlov’s labo- 
ratories are no doubt further illustra- 
tions of this characteristic. Secon’ 
it may be contended that at least a 
human Ss the perception of be 
stimulus relations in the CR situat! 
arising in the course of CR trainin 

tends to reduce and nullify t f 
formed backward CR’s. Or, in om 
words, while the acquisition of bac*” 
ward conditioning is assumed to P1? 
ceed along noncognitive S-R lines, } 

extinction may well be conceive “i 
as involving cognitive-perceptual E 
tors. To be sure, the extinction i 
backward CR’s is more often t ee 
not gradual in development. B A 
this by itself could not be held as 2 

objection to the present assumpt!? f 
since in uninstructed Ss (Ss whos 
attitudes to, and cognition of, the 
situation is not controlled by sP® 
instructions) the manifestation an 
effectiveness of cognition in co” in 
tioning is in itself often progressiV© g 
nature. Lastly, it should be of 
that the mechanism that may under, 
lie the two interpretations—(@) į 


cia 


Both 
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verse conditioning interference and 
(b) cognitive factors—are not likely 
„to be mutually exclusive, so that 
either or both may well be operative 
in any particular CR situation. 


SUMMARY 


> 1. Data of 13 experiments from 
Pavlov’s laboratory (all with dogs), 4 
experiments from other Russian lab- 
oratories (two animal and two hu- 
man), and 13 American experiments 
(nine human and four animal) on 
backward conditioning were analyzed 
and evaluated. 

2. On the whole, the analyzed evi- 
dence is unmistakable in demon- 
Strating that backward conditioning 
D not a case of pseudo-conditioning 
f ` 1s a genuine CR-associative mani- 
Sstation, and that stable backward 

R’s can be obtained and maintained 
Under favorable experimental condi- 
tions, 
ably Roughly, the conditions favor- 

e for the formation of backward 
ani are a US that is not too strong 
o T CS that is not too weak. With 
ing l as the US, backward condition- 

8 is more readily obtained when the 
ence interval is 15 sec. than when 
Se Is 5 sec. and more readily with 5 
s c. than with 2 sec. intervals. With 
T as the US, backward condi- 

ning seems to be possible only 

en the CS is applied after the 
TE a ceased and not yon m 
Shock. during the action 0 the 
zA, general, the main evidence 
tion, formation of backward condi- 
ning—both the favorable and the 
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unfavorable factors—does not fit well 
the CR theories of either Guthrie or 
Hull or Tolman. It could, however, 
be accounted for by the writer's 
“dominance-contiguity” view which 
presupposes a favorable US-CS ratio 
of strength for effective condition- 
ing—that is, the US must be domi- 
nant but not excessively dominant 
over the CS; or, in somewhat subjec- 
tive terms, the organism must not be 
so much preoccupied with the US as 
to fail to attend in some degree also 
to the CS. 

5. Backward conditioning is more 
subject to “extinction through rein- 
forcement” than is forward condi- 
tioning, especially in the case of hu- 
man Ss. A possible explanation for 
the phenomenon might be either 
that (a) Ss develop in the course of 
the backward conditioning a counter 
perception which centrally reduces 
the conditioning, or that (b) “re- 
verse” conditioning is set up—CS’s 
becoming US's and US's becoming 
CS's—as the temporal dominance of 
the CS’s exceeds the intensive domi- 
nance of the US's. Experimental 
evidence for the operation of both 
mechanisms in conditioning has been 

known. 
a The fact taat (a) ae food as 
S backward conditioning Is 
tte ieee with 15 sec. US-CS 
delays than with shorter intervals 

d that (b) with shock as the US it 
ffective only when the CS is ap- 
pete has ceased, may 


ied after the shock hé 1 
-i be taken as evidence against 


any assumption that backward CR’s 
are merely forward CR's to traces 


or aftereffects of CS’s. 


s. Uber die individuell- 


D v, IS. 
oe ätigkeit des Zentralnerven- 


erworbene T: 


stems bei Tauben. Pflüg. Arch. 
Physiol 1926, 213, 370-406. i 
3 Undividually-acguired 
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Harvard University* 


A very large body of experimental 
results have accumulated in the field 
of operant, or instrumental, condi- 
tioning of the rat, the pigeon, and of 
other experimental animals. The 
application to human behavior of the 
laws generated by such research is 
most often done by the use of theory. 
An alternative method is to demon- 
strate that the manipulation of 
classes of empirically defined vari- 
ables that produce specific and highly 
characteristic changes in the beha- 
vior of small experimental animals in 
Skinner boxes produce similar 
changes in the behavior of college 
students. 

This paper reports procedures for 
the direct application of the variables 
defining the paradigm for operant 
conditioning to human behavior, 
and shows that human beings act 
very much indeed like experimental 
animals when they are subjected to 
the same experimental treatments, 
It suggests that direct application of 
conditioning principles to some cate- 
gories of human behavior may be 
justified. The procedures are simple, 
and they may be followed by anyone, 
with a minimum of equipment. 

That it is possible to condition hu- 
man motor behavior will surprise few 
who are concerned with behavior 
theory. Nevertheless, it has not 
always been clear what behaviors will 


1 The substance of this paper was presented 
to the Psychological Society, University 
College, University of London, in May, 1953. 
The writer wishes to express his thanks to the 
many students whose data were made avail- 
able to him. 

* Now at Stanford University. 
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act as ‘‘responses,’’ what events will 
prove to be “reinforcing stimuli,” oF 
exactly what procedures would most 
readily yield reproducible results. 
This paper describes methods that 
have been worked out for easy a" 

rapid operant conditioning of motor 
behavior in humans, states char- 
acteristic findings, and reports sample 
results. Developed in a series O 
exploratory experiments in an ele- 
mentary laboratory course in PSY” 
chology, the methods may have 4 
wider utility. 


Development of the Method 


In one year’s class in the introduc 
tory laboratory, an attempt was 
made to reproduce the Greenspoo" 
effect (1), in which the rate of sayin$ 
plural nouns is brought under exper!” 
mental control by the use, as a re!” 
forcing stimulus, of a smile by tHe 
experimenter, or by his saying “Mm 
mm,” or “Good.” The results wer? 
indifferent: a few students had go? 
success with some subjects; the m% 
jority failed with all their subjects 
The successful students seemed, c25 
ually, to be the best-looking, °°: 
mature, most socially acceptable: 
they tended to have prestige. oe 
suggested that the procedure W2” 
effective if S “cared” about E's be 
havior; that is, if he noticed, and "° 
sponded in one way or another 
what E said or did. th 

This observation is consistent Tis 
the Guthrian (but Skinner-box- n 
rived) view that if one could japla 
any single property shared by 1e", 
forcing stimuli (whether “prima 
or “secondary”), it would prové 


A 
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be that all reinforcing stimuli pro- 
duce a vigorous response of very 
short latency (2). Greenspoon’s pro- 
cedure was therefore modified to 
force S to respond to the stimuli that 
E wished to use as reinforcers. There- 
after, the incidence of failures to 
condition human Ss dropped con- 
siderably. 

Using these methods, many kinds 
of stimuli have been found to be 
reinforcing in the hands of student 
experimenters, and a wide variety 
of responses have been conditioned. 
Pala have been gathered on per- 
ormance under regular reinforce- 
ment, and under such other sched- 
ules as variable and fixed interval, 
and variable and fixed ratio (3, 4), 
both in establishing rates of response 
and in yielding extinction curves of 
appropriate form after the termina- 
a of reinforcement. Experiments 
thts been done on response differen- 
wee discrimination training and 
betes Indeed, there is reason to 

eve that the whole battery of 
i ig phenomena can be repro- 
uced in a short time. Incidental 
data have been obtained on ‘‘aware- 
Ness,” “insight,” or what-have-you. 
woe is a sample set of instructions 
mee human conditioning. In pre- 
gaat È the method more fully, we 
ince amplify each section of these 

ructions in turn. 
Procedure: Human Operant Motor 

Onditioning 
wory struction to subject: “Your job is to 
tap mn hae You get a point every time 
get a ta e with my pencil. As sopp gi 

= poe record it immediately. ou = 
many as of your own points—try to get a 

as possible.” As necessary: I'm sorry, 
a answer any questions. Work for 
Avoid emilin NOT SAY ANYTHING ELSE ro S. 
miling and nodding. 
Reinforcing stimulus: pencil tap- 5 
Be tas esponse: tapping forefinger to chin. 


efo re the tap on the chin is complete 
re reinforcing—that is, be sure that Shas 


tapped his chin and withdrawn his finger. Dur- 
ing regular reinforcement, be sure S does not 

jump the gun” and record a point before you 
give it to him. If S does this, withhold rein- 
forcement and say: “You got no point that 
time. You get a point only when I tap the 
table. Be sure you get a point before record- 
ing. 

4. Procedures: Observe S; determine oper- 
ant level of chin-tapping before giving in- 
structions. 

a. Approximation conditioning of chin- 
tap (described later). 

b. 100 regular reinforcements of chin-tap. 

c. Shift to: 

[} of the subjects] 30-second fixed interval 
reinforcement. 

[4 of the subjects] fixed ratio reinforcement 
at ratio given by S’s rate per 30 seconds. 

[When shifting from regular reinforcement 
to the schedule, make sure that S doesn’t 
extinguish. If his rate has been high, you'll 
have to shift him, perhaps, to a 20:1 ratio— 
with such a change, S will probably extinguish. 
Prevent this by shifting him first toa 5:1 ratio 
(for 2 minutes), then to 10:1 (for 2 minutes), 
then to 20:1. Similarly, put S on 10-second 
F. I., then a 20-second F. I., and finally on a 
30-second one.] 

Continue for 500 responses. 

d. Extinguish toa criterion of 12 successive 
15-second intervals in which S gives not more 
than 2 responses in all. 

5. Subject’s “awareness”: 

[4 of S's] Record any volunteered statement 
made by S. 

[3 of S’s] At the end of the experiment, ask, 
“What do you think was going on during this 
experiment? How did it work?” 

[4 of S's] Add to instructions: “When you 
think you know why you are getting points, 
tell me. I won't tell you whether you're right 
or wrong, but tell me anyway.” At about the 
middle of each procedure, ask, “What do you 
think we are doing now?” 

[4 of S's] At the beginning of each procedure, 
give S full instructions: 4 

a. “You'll get a point every time you tap 

sour chin, like this.” (Demonstrate.) : 

b. “From now on, you'll get a point for 
every twentieth response,” or “ +++ for a re- 
sponse every 30 seconds.” “From now on, 


you'll get no more points, but the experiment 


will continue.” 


6. Records: í 
a. Note responses reinforced during ap- 


proximation; record time required, and num- 
ber of reinforcements given. 

b. Record number of responses by 15- 
second intervals. Accumulate. 

c. Draw cumulative response Curves, 


—————— 
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d. Be sure your records and graphs clearly 
show all changes in procedure, and the points 
at which S makes statements about the pro- 
cedure. 

e. Compute mean rates of response for each 
part of the experiment. 

f. Record all spontaneous comments of S 
that you can; note any and all aggressive 
behavior in extinction. 


General Notes 


Duration and situation. As short a 
time as 15 minutes, but, more typi- 
cally, a period of 40 to 50 minutes can 
be allotted to condition an S, to col- 
lect data under regular and partial 
reinforcement schedules, to develop 
simple discriminations, and to trace 
through at least the earlier part of 
the extinction curve. The experiment 
should not be undertaken unless S$ 
has ample time available; otherwise 
Ss tend to remember pressing en- 
gagements elsewhere when placed on 
a reinforcement schedule. We have 
not tried, as yet, to press many Ss 
very much beyond an hour of experi- 
mentation. 

The experiments can be done al- 
most anywhere, in a laboratory room, 
in students’ living quarters, or in 
offices. Background distractions, 
both visual and auditory, should be 
relatively constant. Spectators, 
whether they kibitz or not, disturb 
experimental results. 

The E may sit opposite S, so that 
S can see him (this is necessary 
with some reinforcing stimuli), or Æ 
may sit slightly behind S. S should 
not be able to see E’s record of the 
data. In any case, E must be able 
to observe the behavior he is trying 
to condition. 

Subject and experimenter. Any co- 
operative person can be used as a 
subject. It does not seem to matter 
whether S is sophisticated about the 
facts of conditioning; many Ss suc- 
cessfully conditioned, who gave typi- 
cal data, had themselves only just 
served as Es. However, an occasional 


slightly sophisticated S may try to 
figure out how he’s “supposed to 
behave” and try to “give good data.” 
He will then emit responses in such 
number and variety that it is dif- 
ficult for E to differentiate out the 
response in which he is interested. | 

People who have had some experi- 
ence with the operant conditioning 
of rats or pigeons seem to become 
effective experimenters, learning 
these techniques faster than others. 
The E must be skilled in delivering 
reinforcements at the proper times 
and in spotting the responses he 
wants to condition. With his first 
and second human S, an E tends tO 
be a little clumsy or slow in reinforc 
ing, and his results are indifferent 
About a third of our students are not 
successful with the first S. Practice 
is necessary. 

Apparatus. The indispensable 
equipment is that used by Æ tO 
record—a watch with a sweep second 
hand, and paper and pencil. Beyond 
these, the apparatus man can have # 
field day with lights, bells, scree" 
recorders, and so on, This is unneceS” 
sary. 


Instructions 


_ Conditioning may occur when 1? 
instructions whatever are given, 

it is less predictable. The instru’ 
tions presented here give consiste? 
success. 

Subjects may be told that they 
are participating in a “game,” ; 5 
“experiment,” or in “the validation 
of a test of intelligence.” All Ae 
work. Spectacular results may , å 
achieved by describing the situatio, 
as a “test of intelligence,” but this * 
not true for all Ss. 2 

In general, the simpler the instr 
tions the better. No mention shou- 
be made that S is expected to do any’ 
thing, or to say anything. Exper 
ence suggests that if more explic 


CONDITIONING OF HUMAN MOTOR BEHAVIOR 


instruction is given, results are cor- 
respondingly poor. Elaborate in- 
structions tangle S up in a lot of ver- 
bally initiated behavior that inter- 
feres with the conditioning process. 
The instructions will be modified, 
z course, to fit the reinforcement. 
‘ss seems to be important for S to 
bore before him a record of the points 
a as earned. (This is not, of course, 
s s record of the data.) It seems to 
Be better if he scores himself, whether 
=a pressing a key that activates a 
ee or by the method described 
4 Most Ss who do not have such 
F ecord either do not condition, Or 
hey quit working. 


Reinforcing Stimuli 


i Ra event of short duration whose 
of PRE in time is under the control 
Hin may be used as a reinforcing 
Th ulus if S is instructed properly. 
e most convenient is the tap of a 
Pencil or ruler on a table or chair 
arm, but E may say “point,” “good,” 
tte soon. Lights, buzzers, counters, 
work. One student found that 
rele up and walking around the 
om and then sitting down was a 
very effective reinforcer for his in- 
structed S. (“Make me walk around 
the room.” )? 
nie g may assign 2 “value” to 
Sts reinforcing stimulus in the 1n- 
tuctions—e.g., for each 10 points 
aac a cigarette, a nickel, or what- 
told Members of a class may be 
that if they earn enough points 
S Ss, they may omit writing 4 la 
Teport. 
3 Where no instructions are given, 
i where the instructions 
DERRE Rar an explicit response to 4 
einforcing stimulus (as in the Green- 


2 
eon with the Columbia Jester's rat, 
kin iemarka to a colleague at the bar of a 
ditia box, “Boy, have I got this guy con- 
fae ed: every time I press the bar, he gives 
a pellet.” 


spoon experiment—i.e., when E 
wishes to use a smile, or an “mm- 
mm,” with the intention of showing 

learning without awareness”) many 
Ss will not become conditioned. 

The most important features of 
the operation of reinforcement are 
(a) that the reinforcing stimulus have 
an abrupt onset, (b) that it be de- 
livered as soon as possible after the 
response being conditioned has oc- 
curred, and (c) that it mot be given 
unless the response has occurred. 
Delayed reinforcement slows up ac- 
quisition; it allows another response 
to occur before the reinforcement is 
given, and this response, rather than 
the chosen one, gets conditioned. The 
best interval at which to deliver a 
reinforcing stimulus seems to be the 
shortest one possible—the E's dis- 
junctive reaction time. 

When S has been conditioned, and 
is responding at a high rate, he may 
show “conditioned recording” —i.e., 
he will record the “point” before E 
has given it to him. The E must 
watch for this. 

When S can observe E, it is en- 
tirely possible that S’s behavior is 
being reinforced, not by the chosen 
reinforcing stimulus, but by others of 

's activities, such as intention 
movements of tapping the table, 
nods of the head, and recording the 
response. The effect of such extra- 
neous reinforcers can be easily ob- 
served during extinction, when the; 
designated reinforcing stimulus is 
withdrawn. The precautions to be 
taken here will depend upon the 
purpose of the experiment. The E 


should thus remain as quiet and ex- 
pressionless as possible. 


The Response 
The E has great latitude in his 


choice of behavior to be conditioned. 


It may be verbal or motor, it may be 
ble operant 


a response of measura: 
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= about. Then he will reinforce suc- 
cessively movement of the right hand, 
movement toward the fountain pen, 
then touching the pen, lifting it, taking 
the top off, and finally taking the top 


500 


400 


200 


CUMULATIVE NUMBER OF RESPONSES 


i o 10 20 340 E] c 


TIME IN MINUTES 


Fic. 2. REGULAR REINFORCEMENT AND 
i SUBSEQUENT EXTINCTION 


| For Curves Ay, Cy, and C; at t=0, rein- 
forced response given for first time (approxi- 
mation phase omitted), From t=0 to arrow, 

| regular reinforcement, Following arrow, no 

| reinforcement, 

l (A) R: touching left hand to right ankle. 

| Sr: “check” said by E; recorded by S. 

(B) R: naming book title or author. Sp: 
| pencil tap, recorded by S. Approximation 
ii conditioning to carat; then regular reinforce- 
|| ment of R only. 


| (Ci) R: folding hands, Sr: pencil tap, re- 
corded by S. Verbalization occurs after maxi- 
mal rate is achieved, 
(C2) same curve as Ci, with scales of both 
abscissa and ordinate multiplied by constants 
to yield a curve comparable to Curve A, 


off and putting it back on. The effect 
of a single reinforcement in shifting, 
in narrowing down the range of a sub- 
ject’s activity, can be interesting to 
observe. It tempts E to depart from 
the procedure originally Planned and 
to spend this time successively dif- 
ferentiating out more and more un- 
likely pieces of behavior. 

It requires skill to shape behavior 
rapidly, whether one deals with rats, 
pigeons, or men. If E demands too 
much of S—that is, if he withholds 
reinforcement too long, S's responses 


may extinguish. If E is too liberal— 
if he reinforces responses that are too 
similar to one another—he may con- 
dition these responses so effectively 
that further progress is slow. Re- 
sponses that are conditioned as a re- 
sult of E's lack of skill in spacing re- 
inforcements, like those that are con- 
ditioned when E is sluggish in de- 
livering reinforcement, are termed, in 
lab slang, “superstitious” responses. 
Even after they have been extin- 
guished, and the correct response has 
been conditioned, they typically re- 
appear during later extinction of the 
conditioned operant. 
Verbal behavior is readily condi- 
tioned, and by the same techniques- 
Almost invariably, in this case, shap- 
ing is necessary unless S has been in- 
structed to “say words,” or unless the 
verbal response is saying numbers 
such as “two,” “twenty-five,” and 50 
on. When E shapes verbal behavior 
he should preselect verbal responses 
that can be unequivocally identifie 
in a stream of language, as for exam- 
ple, saying “aunt,” “uncle,” or the 
name of any member of a family, ° 
Saying names of books and authors 
(even in a particular field, as 
chooses), By reinforcing sentences 
containing these words, Æ achieves 
control of a topic of conversatio” 
(Fig. 1b and c). The E may alse 
bring S to say, and to say repeatedly: 
Particular quasi-nonsense sentences 
such as: “I said that he said that you 
said that I said sọ.” The Ss may be 
conditioned to count, to count 
threes, or backwards by sevens, 2? 
so on, when shaping is used. ) 
Regular reinforcement (Fig. 2 A 
Once the response occurs, E will firs 
reinforce it regularly. One hundre 
regular reinforcements have proven 
ample to build up a resistance to ex 
tinction sufficiently great to permit x 
to shift S to most schedules withou 
tisk of extinction, (As Fig. 2 shows 
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the number of R's in extinction is 
roughly proportional to the number 
of regular reinforcements in this 
range of values.) This number of re- 
inforcements is also entirely adequate 
to yield a high and stable rate of 
response. “One-trial” conditioning, 
found in the rat and pigeon with com- 
Parable procedures, often shows it- 
self: the rate of response assumes its 
stable value after a single reinforce- 
ment of the chosen response. When 
ae occurs, S is not necessarily able 
Me state what he is doing that yields 
ım points. 

ie metimes, an S, after giving a 
e number of responses under reg- 
: reinforcement, will show the 
Fe ome of “satiation” (habitua- 
ea that is, he will give nega- 
es y accelerated curves, declining 

a rate of zero. Although reinforce- 
Ments continue to be given, they be- 


Be EE ee es 


TIME M MIRUTES 


Fic. 3, Fixen RATIO REINFORCEMENT 


TO R: slapping left knee Sr: pencil tap, 
cent, ed by S. Two reinforcements worth 
Tei to S. At t=0, S is shifted from regular 
inforcement to 15:1 rate of reinforcement. 
Aloa begins at arrow. Note bursts of 
nigh rates during extinction, followed by 
wa of complete inactivity. 
x ) and (C) (2 subjects). R: raising left 
Brains Sr: pencil tap, recorded by S: 
Proxi t=0 to t=9, regular reinforcement (ap- 
ae e oN Paate oaneted): At arrow, shift 
in rat ratio of reinforcement, yielding increase 
ject B Approximation phase data: for sub- 
Subject 9 reinforcements over 2 minutes; for 
ject C, 17 reinforcements over 5 minutes. 


come progressively less effective. If 
S “satiates,” E should simply say, 
‘Keep earning points.” This almost 
always restores the rate to its value 
before decline. 


EEA Ss 


a è © MM t a 
TIME OY MINUTES 


Fic. 4. INTERVAL SCHEDULES 


(A) Variable interval (15 second average). 
R: Rub nose with right hand, Sr: “good,” 
recorded by S. Five “goods” worth 1 cent to 
S. At t=0, interval schedule begins after 
75 sec. of regular reinforcement. At arrow, 


extinction begins. 

(B) Fixed Interval (15 second). R: raise 
right forearm. Sr: pencil tap, recorded by S. 
At t=0, interval reinforcement begins. At 
arrow, extinction begins. Note that extinction 
js like that usually obtained following fixed 
ratio reinforcement. 

(C) Variable interval (15 seconds). R: 
raise right forearm. Sr: pencil tap, recorded 
by S. At t=0, variable interval schedule 


begins. At arrow, extinction begins. 


Schedules of reinforcement. The E 
is now free to follow any one of a 
number of schedules of reinforcement 
(3, 4), the simplest of which are fixed 
ratio (where every nth instance of a 
response is reinforced—Fig. 3) and 
fixed interval (where the first re- 
sponse occurring in each successive 
n-second interval following a reim- 
forced response is reinforced—Fig. 
4). The behavior observed under 
these schedules corresponds closely 
with that observed in lower animals. 
As with lower animals, E will find it 
impossible to shift directly to a high 
ratio of reinforcement, or to a long 
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fixed interval, without evidence of 
extinction. Fixed intervals of 15 
seconds and fixed ratios up to 6:1 
may be established immediately 
without danger of extinction. When 
S is shifted, he will at first show great 
increase in the number of kinds of be- 
haviors he exhibits, even though the 
conditioned R continues to occur at 
the expected rate. Verbal behavior 
increases greatly, too. If S has been 
working at a steady rate under regu- 
lar reinforcement (for simple be- 
havior, usually 15 to 25 responses per 
minute) when shifted to a short fixed 
interval schedule, he may exhibit 
counting behavior and state that he 
is earning a point every, say, fifth 
response. If variable-interval or 
variable-ratio schedules are followed 
it is necessary for E to have prepared 
in advance a program, guiding him in 
determining which one of a series of 
responses he should reinforce (for 
variable ratio), or after how many 
seconds he should reinforce a response 
(for variable interval). 

The results obtained are typical: 
high rates of response occur under 
ratio schedules, and so does rapid ex- 
tinction when reinforcement is with- 
drawn. Low, but stable, rates follow 
the interval schedules, with large, 
smooth extinction curves. Temporal 
discrimination, verbalized or not, 
may occur on fixed interval schedules, 
An exception is found in those cases 
where S behaves, on fixed interval, 
as if he had been on a fixed ratio—he 
may give an extinction curve appro- 
priate in form to fixed ratio reinforce- 
ment. The schedule has not “taken 
over.” 

Discrimination. After following a 
schedule of reinforcement for a time, 
E may proceed to set up a discrimina- 
tion, extinguishing the response in the 
presence of one set of stimuli and con- 
tinuing to reinforce it on schedule in 
the presence of another (Fig. 5). 


CUMULATIVE NUMBER OF RESPONSES 


TIME IN MINUTES 


Fic. 5. DISCRIMINATION TRAINING 


(A) (B) R: Rub nose with right forefinger- 
Sr: pencil tap, recorded by S. At t=0, dis- 
crimination training begins, with Sp and A 
alternated through successive 30-second inter- 
vals and regular reinforcement in presence © 
Sp. No reinforcement with Sa. Sp: E's ciga- 
rette rests in ash tray. Sq: E’s cigarette in his 
mouth. Curve A: responses under Sp; curve 
B: responses under Sa. . 

(C) (D) R: turning single page of magazine 
Sr: pencil tap, recorded by S. At t=0, regu“ 
lar reinforcement begins after approximation 
conditioning. At arrow, discrimination trai" 
ing begins, with Sp and Sy alternated throug 
successive 60-second intervals, with regular 
reinforcement under Sp; no reinforcement i" 
presence of Sa: desk lamp on; Sa: desk lamp 
off. Curve C: responses under Sp; Curve )* 
responses under S4. 


Here, it is advisable to have had 5 
on a variable ratio or variable interV® 
schedule, so that S’s discriminatio® 
is not based solely on the omission ° 
reinforcement. Discrimination ! 
humans, as in rats, develops faster °” 
ratio schedules. ee 

For the rapid development of cjs 
criminations, it is desirable to ch005% 
as Sa (negative discriminative sti™Y 
lus) a fairly conspicuous event, me 
as E putting (and keeping) a cigaret 
in his mouth, or putting his recordi” 
pencil down, or placing a book on t g 
table and leaving it there, or crossing 
his legs. The use of less conspicuo, 
Sa leads to the slow formation of t?” 
discrimination. Again, the data ple 
tained are not readily distinguish 8), 
from the data obtained on rats ( 
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except that the time scale is shorter; 
that is, the process is more rapid. 
“Learning set” data may be ob- 
tained in an hour or so by repeatedly 
reversing a discrimination: the dis- 
crimination process occurs more and 
more rapidly with successive re- 
versals. 
Chaining. After having had S on 
a schedule of reinforcement, E may 
decide to chain two responses. He 
does this by conditioning the first 
(A) and then extinguishing it, simul- 
es conditioning the second 
pe When the second is conditioned, 
oe l then proceed to withhold re- 
rcement until the first recurs. He 
ga now make reinforcement con- 
ingent on the occurrence of the se- 
quence A-B, and so on. The Es have 
succeeded in chaining together sev- 
eral responses by this procedure. 
aang sets for chaining also occur 
thin ae beings; during the extinc- 
had E a simple operant, an S who 
BF een conditioned to chain a series 
ie and had then been ex- 
guished regressed to these old re- 
Ponses and gave them in new se- 
a with each other, and with 
he response undergoing extinction. 
Extinction. During extinction, E 
Should be careful to show minimal 
changes in manner and behavior 
eon than those necessitated by the 
ailure to reinforce. By thoughtlessly 
Aa down his recording pencil, E 
re! obtain a very small extinction 
rve. Extinction curves obtained do 
- differ in any remarkable way 
‘ om comparable data obtained on 
e lower animals. 
i The human shows many interest- 
ine incidental pieces of behavior dur- 
a extinction, whether or not he is 
S are—i.e., has verbalized—that no 
ore reinforcements are forthcoming. 
thara make statements to the effect 
e is losing interest, that he is 
ored, that he has a pressing engage- 


ment; he may “get mad,” or make 
mildly insulting remarks, or he may 
suddenly decide that “‘this is a stupid 
game,” or a “‘silly experiment.” He 
may indulge in conversation full of 
remarks deprecating institutions 
(e.g., the college) or himself. One S 
said, “I’m going to give you one more 
minute to give me a point, and if you 
don’t, I’m going to go do math.” (He 
left, in fact, after three minutes, 
when he had almost met the criterion 
of no responses in three minutes.) 
Many Ss volunteer the information 
that they ‘‘feel frustrated” when they 
can’t get any more points. 

The Ss also show behavior that 
some have called “regression”; that 
is, they give (“fall back on”) re- 
sponses that were reinforced during 
approximation conditioning. Regres- 
sion is most easily demonstrated if Æ 
first conditions one response and then 
extinguishes it while conditioning 
another. In this case, during the ex- 
tinction of the second response con- 
ditioned, S will usually shift back and 
forth between the two. 


Recording 


The essential records are the num- 
ber of responses that occur in unit 
time, and the specification of the re- 
sponses that were reinforced. If Æ 
makes a check on a piece of lined 
paper whenever the conditioned re- 
sponse occurs, moves down one line 
at the end of successive 15-second 
recording intervals and puts a bar 
across the check whenever a rein- 
forcing stimulus is delivered, he will 
have a record from which the familiar 
graph of cumulative number of re- 


sponses as a function of time can be 


constructed. 

Recording should be done behind a 
screen such as that provided by a 
clip-board or book held vertically. 
This recording procedure, together 
with the fact that E is busy watching 
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S closely, has the merit that it is dif- 
ficult for E to draw over-hasty con- 
clusions about the “goodness” of the 
data that are being collected, so that 
badly intentioned student Es cannot 
manufacture “good” results. After 
learning to record with one hand, and 
to deliver reinforcements with the 
other, E will have no difficulty re- 
cording until S achieves very high 
rates of response, in which case, E 
may not be able to record or even 
count fast enough, and some re- 
sponses will be missed. For experi- 
mental purposes, then, it is some- 
times wise to choose a response for 
conditioning that requires not less 
than a second to complete. 

The E may keep additional records 
such as, for example, a description of 
the responses reinforced during shap- 
ing, and of other behaviors that ap- 
Pear in extinction. 

Awareness.4 Let us define “aware- 
ness” as the disposition of S to verbal- 
ize one or more of the rules followed 
by E. The S may be Partially or com- 
pletely “aware”; that is, he may be 


4 Since these experiments are most interest- 
ing when S is not aware of what behavior is 
being reinforced, we have followed the pro- 
cedure of dividing laboratory sections into 2 
groups, members of which will serve as Ss for 
the members of the other, and vice versa. One 
group is instructed to work on verbal behavior, 
and to bring it under discriminative control, 
and the other to work on motor behavior, and 
to study extinction as a function of one or 
another schedule of reinforcement. The effect 
of the sets established is sufficient for many Ss, 
despite their otherwise full information on 
human conditioning, to remain unaware of 
what E is reinforcing, and what Procedures he 

is following. 

Incidentally, it is futile to tell an S who has 
been conditioned, or who is familiar with 
conditioning that you want to “demonstrate 
conditioning,” using him, Most Ss become 
very self-conscious under these circumstances 
and will not work. It is not at all difficult to 
demonstrate the procedure toa group that 
sits quietly and watches, but in this case it is 
necessary to use a naive S. 


able to state one or more of these 
rules. He may be aware that a point 
is a reinforcing stimulus (as described 
in a textbook with which he is famil- 
iar); that Æ is trying to make him do 
something; that he is now doing 
something more often than he was 
before; that a certain response is 
being reinforced and is “right”; that 
he will get no points while E is smok- 
ing; that a point comes after every 
tenth response; that points average 
one per minute; that his response 1S 
being extinguished, and so on. pe 
may be aware or unaware of any or 0 
all of these. h 
Enough observations on orn 
awareness have been made, bo 
through the subject's emitted verba 
behavior during the experiment, a” 
by asking Ss about the experimen 
after its conclusion to permit some 
general statements: y 
1. In motor operant conditioning 
about half the subjects do not beco™® 
aware of what response is co 
ditioned, that is, of what they a€ 
doing that earns them points, unt! 
many reinforcements have been ie 
livered and long after the stable r2 5 
of response has been achieved. Tha 
then, remain for some time blissful y 
ignorant of what they are doe 
Conditioning and extinction Ra 
take place without S ever “figuring 
out.” P 
2. Very few subjects become aw re 
of the particular schedule they pa 
being reinforced on for many mi 
utes. On fixed interval schedi 
some Ss show the beginning ° re 
temporal discrimination long ber 
verbalizing it, and others may var 
balize the interval and only lā ge 
show a corresponding gradual cha” 
in behavior. . ularly 
3. Many subjects, particu gan 
those whose motor behavior is be! 1 
reinforced, keep up a running he 
commentary on the procedure, # 
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exhibit very elaborate “reasoning” 
behavior. Their behavior is not dif- 
ferent from that of silent Ss, or of Ss 
who show a less “rational” approach 
to the situation. 

_ 4 Subjects will occasionally show 
insight or “aha!” behavior; i.e., sud- 
denly state that now they know what 
gives them points. Others become 
aware gradually: “I think it has 
something to do with my chin.” 
Some are never quite sure: “Look, 
what was right?” 

5. Most important, with some ex- 
ceptions that will be described, 
awareness seldom seems to alter the 
behavior. Sudden “insights,” when 
they occur, are not necessarily asso- 
ciated with abrupt changes in rate; 
abrupt changes in rate, e.g., ‘‘one- 
trial conditioning,” may occur with- 
out such awareness. Statements such 
as, Oh, now you're extinguishing 
me,” made by highly sophisticated Ss 
are not correlated with abrupt and 
neat declines of rate to zero— 
i (“on the chance that I’m wrong, or 

hat you'll change the procedure”) 
Proceeds to generate the remainder of 
a typical extinction curve. 

_ 6. In experiments where S is, by 
instruction, fully aware of the experi- 
Mental contingencies (but where he 
does not know the kind of results he 
18 “supposed” to give) he will behave 
tmmediately, on each instruction, as 
other Ss do only after long periods 
of reinforcement. He will immedi- 
ately give, for example, a high rate of 
response when he is placed on fixed 
Tatio; he will give only one or two 
responses under Sa, or in extinction, 
and soon. He has “learning sets,” OF, 
to put it another way, the instruc: 
tions behave like well-established 

\scriminative stimuli. He starts 
Ku the behavior that is asymptotic 
or uninstructed Ss. 

b In any event, the associated verbal 
ehavior, whether or not it in any 
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way “directs” motor behavior, is 
highly sensitive to the experimental 
variables. During approximation 
conditioning and extinction, it is apt 
to occur at a relatively high rate, and 
to be aggressive in content: “The 
procedure is silly,” E is “wasting my 
time,” and so on. Under regular re- 
inforcement, and on schedules, after 
stable performance is achieved, S has 
rather different sorts of things to say. 
The S's verbal ‘‘approach” to the 
situation is invariably interesting, as 
are the discrepancies between his 
statements about his performance 
and the performance itself. Since our 
procedures typically terminate with 
extinction, most Ss finally term the 
experiment “silly,” “childish,” and 
“stupid,” despite the fact that they 
have “voluntarily” been working 
very hard indeed to earn points. 

Possible research uses of the method. 
This kind of experiment, in which 
verbal behavior can be treated as 
either a dependent or independent 
variable, will perhaps find its greatest 
usefulness in the experimental analy- 
sis of so-called “cognitive processes” 
—that is, of S's awareness of what he 
is doing, and of the rather different 
dependencies of verbal and of other 
behaviors on a common set of experi- 
mental variables. 

A second investigative area that 
this procedure makes amenable to 
experimental investigation is that of 
the classes of events that reinforce 
human behavior. What is a ‘‘point?” 
Why do “points” reinforce? How 
can their “value,” ie. their effec- 
tiveness in controlling behavior, be 
manipulated by instructions? Is a 
“point” from Experimenter A “worth 


as much” as one from Experimenter 


B? How will the addition of mone- 
tary reward vary the tendency of S 
to show satiation for “points”? What 
will showing S “‘group norms” for 
points collected do to his perform- 
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ance? It offers the possibility of 
measuring rapport. 

A third area is one on which some 
preliminary investigations have been 
done: the gross changes in behavior 
that occur in extinction and in shift- 
ing from one reinforcement schedule 
to one giving fewer reinforcements. 
Many subjects become, under these 
conditions, ‘‘disturbed,”’ “upset,” 
“emotional,” “aggressive” and “frus- 
trated.” Observations of changes in 
the rate of speaking, of moving about 
in a chair, and of such idiosyncratic 
operants as scratching the head, 
tapping the forehead, and so on, sug- 
gest that these rates are a function of 
the ratio of reinforced to unreinforced 
responses (6). 


Discussion 


Operant conditioning as it was 
described in The Behavior of Organ- 
isms is concerned with the behavior 
that the layman calls “voluntary.” 
This characterization is still valid— 
the behavior during conditioning is 
not “forced,” as one might character- 
ize the conditioned knee-jerk, or 
necessarily “unconscious,” as might 
be applied to the conditioned GSR. 
Ss work because they “want to.” S’s 
behavior is nonetheless lawful and 
orderly as a function of the manipula- 
tions of E, and his behavior is pre- 
dictable by extrapolation from that 
of lower animals. 

These assertions, like the procedure 
itself, involve no theoretical assump- 
tions, presuppositions, or conclusions 
about “what is going on inside S’s 
head.” It does not assert that all 
learning occurs according to this set 
of laws, or that this process of con- 
ditioning is typical of all human 
learning. It does not assert that S is 

no better (or worse) than a rat, or 
that his behavior is unintelligent, or 
that since, say, Ss get “information” 
from a reinforcing stimulus, so too do 
rats. The behavior is highly similar 


. 


in the two cases—we leave it to others 
to make assertions to the effect that 
rats think like men, or that men think 
like rats. 

The procedures can be character- 
ized as bearing close relationship to a 
number of parlor games. Indeed, 
such conditioning might be consid- 
ered by some as nothing more than a 
parlor game. This would not be the 
first time, however, that examples of 
rather basic psychological laws 
turned up in this context. Parlor 
games, like other recreational ac- 
tivities are, to be sure, determine 
culturally, but it is doubtful that 4 
parlor game could be found whose 
rules were in conflict with the general 
laws of behavior. 

That the procedure is more than 4 
parlor game is demonstrated by the 
fact that it provides a situation in 
which a number of the variables con- 
trolling voluntary behavior can be 
experimentally isolated and manipu- 
lated; that stable measures of a wide 
variety of behavior are yielded and» 
finally, that the procedure yields 
orderly data that may be treated 1" 
any one of a variety of theoretic@ 
systems. 


Theoretical Discussion 


The data lend themselves very well 
indeed to theoretical discussion i 
terms of “perceptual reorganization, x 
“habit strength,” “expectancy,” ° 
“knowledge of results,” as well as ie 
simple empirical description in pr 
vocabulary of conditioning. Chac" 
à son goût. 


Summary 


A series of procedures are presented 
that enable an experimenter to " a 
produce, using the motor (and, a 
bal) behavior of human subjecto 
functions that have been previous 
described in the behavior of rats ae 
pigeons. Some remarks on “awa 
ness” in the situation are made. 


N 


w 
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Northwestern University 


This report presents materials de- 
veloped for use in studies of verbal 
concept formation. It is not our in- 
tent to review materials used by 
others to study concept formation, 
as this has been done by Vinacke (3). 
However, it may be noted that tasks 
or materials which have been used 
are quite diverse in nature. With few 
exceptions (e.g., Weigl-type card sort- 
ing) no systematic series of experi- 
ments has been built around a single 
task. While this lack of task stand- 
ardization attests to the ingenuity 
of individual workers in constructing 
new materials, the situation may not 
be entirely satisfactory for efficient 
development of laws and theories. 
In the more highly developed areas 
in psychology only a few basic tasks, 
procedures, or materials have been 
used. Thus, classical conditioning, 
the Skinner box, nonsense syllables, 
the pursuit rotor, and the psycho- 
physical methods (to mention a few) 
all have had widespread use. While 
some may justifiably raise questions 
concerning generality of findings 
based on such a limited number of 
procedures and tasks, it cannot be 
doubted that interlaboratory com- 
munication and continuity is greatly 
facilitated by the use of common 
basic tasks and procedures, 

The concept formation studies 
based on verbal stimuli are few in 
number (3). And, so far as we know, 
there have been no studies using ver- 


1This work was done under Contract 
N7onr-45008, Project NR 154-057, between 
Northwestern University and the Office of 
Naval Research. 
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bal stimuli to which the already ex- 
istent responses were known. We 
feel that an evaluation of the re- 
sponse tendencies of the Ss is ex- 
tremely important. We have else- 
where (2) pointed out in some detal 
that concept formation basically con- 
sists of the perception of relation- 
ships among stimuli. Furthermore 
we have suggested that relationships 
can be perceived only when one Or 
more common responses are evoke 
by the different stimuli. If this is the 
case, and if verbal stimuli have cer- 
tain response tendencies already €97 
tablished, then we need to know what 
these tendencies are. Therefore, our 
basic objective in developing ma- 
terials for verbal concept formation 
was to determine what are the re- 
sponses to the stimuli which are pre 
sented to S in a concept-formatio® 
situation. Thus, if we present the 
word tomato to S, we want to knoW 
what this stimulus makes S “think 
about.” If we know such respons® 
tendencies for a large number of 
for a large number of words, we ca” 
devise studies which will give us di- 
rect information on why differen 
concepts are learned at differer | 
rates. And of course, the materia” 
can be used in studying the influent? 
of environmental and subject va" 
ables. eke 
Our final set of materials satisfies 
the objective in a more restrict!’ 
way than we had originally hope. 
Nevertheless, we have found ate 
materials quite useful in several stt 3 
ies (recently completed or un G 
way). It is possible that other inve* 
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tigators will also find them worth- 
while for certain kinds of researches. 

The method we used in developing 
these materials evolved only after 
trying several procedures which 
proved unsatisfactory for one reason 
an another. We shall first report 

riefly on these unsatisfactory pro- 
cedures in order that it may be under- 
rst why we made the compromises 

e did. For want of a better term we 


call these ; 
procedures scalin ro- 
cedures. etg 


PRELIMINARY WORK 


M In both preliminary and final scal- 
aee have worked only with con- 
o an The Ss were always in- 
ae ed to consider the words as 
fy and never verbs, e.g., the word 
fas a to be thought of as the insect, 
atte e act of flying. In our first 
pace we tried to get (a) com- 
and free associations to the nouns, 
al successive free associations 
Nec noun, We wanted the succes- 
tiie Satis in order to deter- 
bi ts oe hierarchy of responses to 
Rac piles, This attempt proved 
den i actory. We found very few 
i which elicited the same re- 
above (a necessity, as indicated 
the e) and successive responses to 
oat same word seemed to be depend- 
a on the previous responses as well 
on the stimulus itself. 
Pe ate: next step was to ask for a sin- 
word or phrase under essentially 
ar oegyon instructions. We di 
au the instructions slightly Te- 
ite in not allowing synonym 
A — (for reasons which we wil 
85 iscuss). We gave 120 nouns to 
in Ss under these conditions follow- 
E which we set out to categorize the 
ee a ee The results were dis- 
a T Not only did we get few 
eae identical responses to dif- 
stimuli, but in many cases 
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little identity of response to the same 
stimulus by different Ss. For 85 Ss 
we got as many as 50 different re- 
sponses to the same stimulus. There 
was also considerable unreliability 
in categorizing the responses. 

The third procedure involved a 
further restriction on the type of 
association we allowed Ss. Instruc- 
tions told S that he was to give the 
first descriptive word or phrase which 
occurred to him upon seeing the 
stimulus word. The description could 
be in terms of physical properties or 
in terms of usage of the object sym- 
bolized. The instructions eliminated 
synonyms, clang associations, Se- 
quential associations, and several 
other types of responses which occur 
in free association. We gave 1,000 
nouns to 84 Ss and again went 
through the categorizing procedure. 
While these instructions did yield the 
restriction in responses desired, we 
still obtained too few identical re- 
sponses to different stimuli for our 
projected purpose and the categoriz- 
ing remained somewhat unreliable. 

However, the results of this third 
preliminary study gave us informa- 
tion which allowed us to plan what 
proved to be the final procedure. We 
noted that in the data for the third 
study the responses which overlapped 
(identical responses to different stim- 
uli) were what we shall call “sense 
impressions.” By sense impressions 
we mean such characteristics of ob- 
jects as color, size, shape, texture, and 
soon. The response “yellow,” for ex- 
ample, occurred to several different 
stimuli. Furthermore, such responses 
could be categorized with near-perfect 
reliability. Therefore, our final scal- 
ing technique was to use nouns to 
which we felt at least one sense- 
impression response would be fairly 
common. With these nouns we fur- 


ther limited our instructions sO that 
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only descriptions in terms of sense 
impressions were allowed. We turn, 
then, to the details of the final pro- 
cedure. 


SCALING PROCEDURE 


Materials and subjects. A total of 
328 nouns was selected for the final 
run. Mimeographed data sheets were 
prepared with 328 spaces on them. 
Seven different groups of Ss were 
run, the groups varying in number 
from 13 to 30, for a total of 153. 
These Ss were all taking elementary 
psychology at Northwestern Uni- 
versity. They constitute a sample 
from a population on which we ex- 
pected to do the actual concept-for- 
mation experiments after the ma- 
terials were prepared. 

We felt it possible that Ss might 
develop more restrictive response 
sets than our instructions intended. 
That is, S might develop a set for 
responding with a certain class of 
responses, such as color responses, 
to a series of stimuli. This might 
continue until a color response was 
entirely inappropriate for a stimulus. 
He might then have a run on, say, 
shape of objects. We could not, of 
course, eliminate these runs if we 
were to present more than one stimu- 
lus to each S. Furthermore, for all 
we could tell, what appeared to be 
runs might actually represent the 
dominant responses irrespective of a 
run-set. In view of these considera- 
tions we presented the words in a dif- 
ferent order to each of the seven 
groups, each order being determined 
on a random basis. If run-sets were 
operating they should not heavily 
influence the response frequency in a 
given category for any one word. Ac- 
tually, as the data turned out, the 
response frequency for a given cate- 
gory for a given word was highly 
comparable among the seven groups, 


so that we do not now feel that the 
run-sets were very strong. 

Procedure and instructions. After 
instructions and preliminary practice 
(to which we will return in a moment, 
the 328 words were presented to Ss. 
The words were flashed on a screen 
and pronounced, one at a time, at @ 
rate of 6 sec. per word. During the 
6 sec., S perceived the word and 
wrote down the first association 
which occurred to him. Only a single 
word was allowed as a response. The 
Ss were warned not to rely on the 
auditory stimulus alone, but to look 
back at the screen before writing 
down a response. The rapid rate ° 
presentation was intended to prevent 
S from doing any selecting of his re- 
sponses. Rests were given after every 
50 stimulus words. With instruc- 
tions, practice, and main presenta- 
tion, total experimental time aver 
aged about 80 min. 

The instructions, of course, 4° 
critical in our attempts to obtain the 
kind of responses we desired—s¢”” 
sory-impression responses. Yet, the 
verbatim instructions are of little 
significance since the establishment 
of the set came about largely throug 
the discussion of responses to be 
Practice words. The Ss were tol 
that we were developing materials 
for use in concept formation or think- 
ing studies. They were told that W° 
were not interested in their Spor 
sonalities,” etc., and they could E 
main anonymous if they chose. Fre 
association was then explained 
them, although most were acquainté 
with the idea. They were next tO 
that while we wished association“ 
only one particular class of respons” 
would be allowed. These were the 
described as sense impressions, we? 
the responses one might use to for 
scribe an object upon perceiving it 
the first time. Then, a number ° 
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illustrations of responses not allowed 
were given, eg. synonyms, chain 
associations, opposites. Preliminary 
work with a small group showed that 
the instructions as outlined were in- 
sufficient; major success in estab- 
lishing the set came only when we dis- 
cussed responses made to practice 
stimuli. 

Twenty practice nouns were used. 
The Ss were shown the first practice 
word and were asked to write down 
as quickly as possible the first asso- 
ciation which seemed to be a sense 
impression. Then we asked Ss to 
announce the responses they had 
written. Responses which did not 
meet the requirements were dis- 
cussed and Ss were shown why these 
Tesponses did not conform to the in- 
structions. Thus, if someone re- 
sponded to acorn with “oak” they 
were shown that “oak” was not a 
sense description of acorn. Responses 
such as “hard,” or ‘‘small,” or 
rown were noted as being sense 

pressions. However, we made a 
real effort to avoid “setting” Ss 
or any particular class of sense im- 
pressions, for we were concerned 
about the run-sets mentioned earlier. 

Following the presentation of the 
first word, and the discussion of the 
responses to it, the second stimulus 
was presented and the responses dis- 
cussed. This continued until all Ss 
appeared to “get the idea” and were 
responding as desired. This required 
approximately 10 words, varying 
Somewhat from group to group. The 
remaining 10 practice words were 
Presented without comment and at 
gradually increasing speed 50 that by 
the 20th word Ss were responding 
at the rate required for the presenta- 
tion of the major list. This speed, to 
repeat, was 6 sec. per word. 

To many words one or more Ss 
failed to give a response, & result 
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which we expected because of the 
rapid speed of presentation. There- 
fore, we do not have 153 responses to 
all stimuli. The actual numbers 
range from 147-153. 

Categorizing. The categorizing of 
the responses was a fairly routine 
matter; in only a few cases was there 
doubt. Responses were combined if 
they apparently expressed the same 
idea. Thus, the responses “big” and 
“large” were assumed to mean the 
same thing. While we have not com- 
puted any index of reliability for the 
categorizing, it is noted that agree- 
ment of response frequencies for the 
various subgroups was high. 

‘A total of 115 words in the original 
328 elicited responses which we felt 
warranted their exclusion from the 
final list of stimuli. There was a num- 
ber of reasons for exclusion, some of 
which we will mention: 

1. Ambiguity of stimulus. For ex- 
ample, the word dollar was taken by 
some Ss to mean a silver dollar, and 
by others to mean the bill. 

2. Some stimulus words elicited 
such a wide variety of sense impres- 
sions that the frequencies in all cate- 
gories were too small for subsequent 
use. 

3. To some words sensory-impres- 
sion responses were not given with 
sufficient frequency. For example, 
Ss did not respond to the stimulus 
tide with any high proportion of 


sense impressions. 
4. To some stimuli two or more 


dominant responses occurred which 
were somewhat contradictory. For 
example, to the stimulus iodine, both 
“brown” occurred fre- 


“red” and 
quently. 

5. We eliminated most words 
which themselves are used to denote 
sense impressions, €-8-, gold. 


We have not rigidly applied the 
criteria indicated above; that is, we 
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do not have a “purified” set of re- 
sponses. Those that we did retain 
which we recognized as not being un- 
ambiguous cases we felt would still 
be useful for some purposes. 

~ Our final list contains 213 stimulus 
words. Our list will show the per cent 
of total responses to each word which 
fell into specified categories. In addi- 
tion, for each word there is a miscel- 
laneous category. If a particular re- 
sponse was given by less than 5 per 
cent of Ss, it was put in the miscel- 
laneous category. Furthermore, we 
obtained some responses which were 
not sense-impressions (by our inter- 
pretation) and these were also put 
into the miscellaneous grouping. 

In order to simplify the presenta- 
tion it is necessary to code the re- 
sponse categories. There are 40 such 
categories. Responses not included 
in the 40 categories, but having a 
frequency 5 per cent or greater will 
be written out. Such instances are 
infrequent. 

It may be noted (Table 1) that 
category 5 is indicated as “Smelly.” 
We did not attempt to distinguish be- 
tween good and bad or pleasant and 
unpleasant smells. Some people find 
gasoline smells pleasant, others un- 
pleasant. But, the smell of an ob- 
ject, whether pleasant or unpleasant, 
is a descriptive term that is com- 
monly used. We have found it useful 
in making lists of concepts to retain 
the idea of “smelly” although we 
realize that it is somewhat different 
than other words which may occur 
as descriptive terms for Opposites, 
such as big and small, black and 
white, sour and sweet, etc, Finally, 
let it be clear that we do not claim 
that all response categories are 
“pure” sense impressions. It will be 
apparent in inspecting categories 

that there is some variation in this 
matter. 


TABLE 1 


CATEGORIES AND CATEGORY NUMBERS TO 
BE UsED IN CONJUNCTION WITH 


TABLE 2 
Category Category Category Category 
Number Number 

1 Round 22 Sour-bitter 
2 Small 23 Hairy-furry 
3 White 24 Wet-moist 
4 Hard 25 Woody 

5 Smelly 26 Strong, sharp 
6 Soft tangy 

Ch Shiny 27 Heavy 

8 Big 28 Greasy 

9 Long 29 Dirty 
10 Yellow 30 Deep 
11 Brown 31 Cold 
12 Metallic 32 Noisy 
13 Green 33 Fuzzy 
14 Sweet 34 Light (not 
15 Red heavy) 
16 Sharp 35 Square 

17 Pointed 36 Clear 
18 Slimy 37 Sticky 

19 Black 38 = Narrow 
20 Smooth 39 Rough 
21 Dark 40 Flat 


Ee O 
RESPONSE FREQUENCY BY CATEGORY 

In connection with Table 2, cet 
tain explanatory notes are essential: 
To the left is listed each stimulus 
word presented to the Ss. The re 


sponse data are given under the CO!” 


umn headed “Categories and Per 
Cent Frequencies.” A number not i? 
Parentheses indicates the category 
while the number in parentheses im- 
mediately following indicates the p°” 
cent of responses falling in the cara 
gory. Different categories are set ° 
by semicolons. Thus, if an entry 
reads 2(45); 6(22), it means that x 
Per cent of the responses fell in Cat? 
gory 2 (small), while 22 per cent fea 
in Category 6 (soft). If a word ak 
pears instead of a category number 
means that it is a special case na 
included in one of the 40 categories: 
Next, there is a column hea E 
“% Miscellaneous Responses.” T 
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TABLE 2 
(See text for explanation) 


ie Categories and Per Cent Frequencies neue, Thee ae 
Alley 21(49): 9(16); 38(14); 29(10) 10 13 
Aluminum 7(59); 12(14); 34(12) 14 2 
Ammonia 5(88) 12 5 
Anchor 27(57); 12(15); 4(7); 8(5) 17 26 
Ape 23(46); 8(30); 11(5) 19 6 
Apple 15(67); 1(19); 14(5) i9 iA 
Armor 12(28); 7(25); 4(24); 27(14) 10 37 
Asparagus 13(78): 9(9) 13 6 
Asphalt 19(48); 4(29); 37(5) 19 5 
Atom 2(87) 13 8 
Auditorium 8(84) 16 3 
Badge 7(32); 12(27): 1(21); 2(5) 15 3 
Balloon 1(55); 34(17); Rubbery(8) 20 17 
Banana 10(59); 9(12); 6(11); Slippery (5) 13 13 
Bandage 3(73) 27 14 
Barrel 1(72); 25(15); 8(6) 7 32 
Baseball 1(70); 3(11); 4010); 2(5) 4 15 
Baton 9(50): 7(12); 12(7); Thin(11) 21 1 
Beak 16(50); 17(17): 9(12); 4(9) 12 15 
Bean 13049); 2018); 1(12): 96) 15 8 
Bed 6(76) 24 
Beet 15(87) 13 11 
p + 8(8); 19 10 
Ply 1(43); 6(24); 8(8); 20(5) K 
Blood 15(91) 9 
Blush 15(96) 4 27 
Bone 4(47); 3(34) 19 5 
Boulder 8(46); 4(19); 27(10); 1(10) 16 k 
Bracelet 7(25); 1(25); 10(19); 12(13) 18 10 
Bread 3(35); 6(31) 28 a 
Brick 15(46); 4(35): _35(7); 27(5) 8 9 
Buckle 7(32); 12(31); 35(10); 4(9); 10(6) 12 : 
Bungalow 2(46); 3(9); 25(7); Low(5) 33 T 
utter 10(62); 6(21); 28(7) 1 S 
Button 1(61); 2015); 46); 309) 14 39 
Cabbage 13(53); 1015); 5012); 3(5) 16 16 
Cabin 2(39); 25(28); 11(11) 25 r 
Camel 11(30); 23(15); 8(14); Humpy(20) 21 : 
Canary 10(82); 2(5) E ; 
Capsule 2(51); 1(22) ar S 
Carrot 10(8): o(@): Oranen a aT A 
auliflower 3(64); 13(5); bumpy y 33 
Cave 2160; 30(6); Damp(14); Hollow(5) on 13 
cule 3(80) (9); 1108) 15 1 

amois 6(68); 20 (9); 19 4 
Cheese Pon 5(25); 26(6); Holey (6) 4 3 
erry 15(77); 1(14) H ‘6 
Chestnut DAN. 4(18); 1(14); 2(9) 1D 20 
Chocolate 11(61); 14(29) a A 
Cigar 5(40): 11(26): 904): 1(7) : a 3 
Cigarette 3(33); 9015); 504); 1(6); Smoky(15) A A 
cinnamon 11(40): 14(21); 26(15); 5(6) 3 us 
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TABLE 2 (continued) 


‘ee Categories and Per Cent Frequencies 
Closet 21(64); 2(24) 
Clove 26(32); 5(18); 13(8); 14(8); 11(7) 
Coal 19(85); 4(7) 
Coffee 19(32); 11(24); 5(12); 22(11); Hot(12) 
Collar 3(44); 1(16); Stiff(19) 
Cork 34(27): 11(25); 6(21); 1(7) 
Corn 10(81) 
Cradle 25(29); 2(24); 6(11); 11(5) 
Cranberry 15(69); 22(10);  14(7); 1(6); 2(S) 
Crown 10(35); 7(20); 1(13); 12(7) 
Crumb 2(79) 
Crystal 36(52); 7(24); 4(9); 3(5) 
Cucumber 13(52); 9(14); Prickly(5) 
Custard 10(39); 6(26); 14(11); 20(7) 
Daffodil 10(68); 5(12) 
Dagger 16(70); 17(10); 12(6) 
Dandelion 10(85) 
Derby 1(33); 19(29); 11(14) 
Diamond 7(65); 4(15); 36(9) 
Diaper 3(50); 24(17); 35(8); 5(7); Triangle(5) 
Dime 1(30); 12(23); 2(15); 7(13); Thin(9) 
Dome 1(70); 8(5); 17(5); High(9) 
Doughnut 1(71); 14(7) 
Dungeon 21(67); Damp(22) 
Earthworm 18(44); 9(17); 2(11); Crawly(5) 
Eel 18(68); 9(15) 
Elephant 8(83); Gray(11) 
Enamel 3(28); 7(24); 4(20); 20(14) 
Ether 5(70) 
Eye 1(32); 2(10); 11(8); 7(6); Blue(26) 
Fang 16(75); 3(10); 17(5); 9(5) 
Fishhook 16(70); 17(9); 12(5); 1(5); 2(5) 
Flannel 6(54); 15(12); 33(5); Itchy(8) 
oe 2(86) 
‘orest 13(52); 21(14); 8(12); Dens 
Formaldehyde 5(81) Oe: = ia 
Freckle 11(54); 2(19); 15(11) 
Frost 31(54): 3(34) 
Fur 6(75); 23(6); 33(5) 
Garbage 5(80): 29(7) 
Gardenia 5(65); 3(28) 
Garlic 5(58): 26(25) 
Gasoline 5(54); 24(7) 
Germ 2(84) 
Ginger 26(40); 11(15); 14(11); 5(11) 
Globe 1(95) 
Gnat 2(76) 
Goat 3(29); 5(20); 23(18); 29(5) 
Gorilla 8(42); 23(41) 
Grape 1(18); 6(7); 14(5); 13(5); Purple(43) 
Grapefruit 22(52); 10(23); 1(12) 
Grass 13(88) 
Grasshopper 13(55); 2(18) 


Gym 


8(54); 5(21) 


% Misc. 
Responses 


12 
27 

9 

9 
21 
21 
19 


Thorndike- 
Lorge 


20 
8 
AA 
A 
44 
11 
A 
21 
5 

A 


N 
Pwannwnan 


= ee 
aAtOoKnNnew& 


> > 
Fabs KOS iA 
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TABLE 2 (cominued) 


aooo o ooo a a a ŇA 


- f Misc. Thorndike- 
ae Categories and Per Cent Frequencies gos Lorge 
Hailstone 4(49); 1014); 309); 318); 27) io 3 
Hallway 9(54); 38(20): 21(16) 12 1 
Harpoon 16(66); Piet 13 8 
Hatchet 16(77); 4(5); 20 AA 
a O A, 1D: 78:270 16 i 
Helmet 4(31); 12(22); 107: A 16 14 
Hog 8(20); 29(19):; O 13 A 
Howat T39, 509; SU: Cand: Ques) M 4 
Icicle 31045); 9(15); 16014); 17(7); 36(5) ` 9 13 
Ivory pea 4(14): 20(12) 17 11 
vy 15 2 
Jellyfish 18(49): 6(31); 2(5) 26 41 
Jewel 7(67); 4(7) 21 35 
Kitten san aa 10 A 
nite H 
Knob 1(68);  4(9); 2(5) i 5 
Knuckle 4(62); 101); Bony (9 5) 13 9 
Lard 28(41); 3(27); 6(8); Thi 15 37 
Lawn 13(77); 200, 3 27 
emon 22(65); 10(32 . 7 2 
Limousine 19(27); 9(26): 8(21); 7(14); Sleek(5) 19 47 
Linen 3(59); siam Flakey(8) 15 A 
int 3(38); 2(32); i Fla 18 A 
Lips 15(59); 6(24) 12 7 
Lizard fast); 13023): 2(6); Scaly(8) 17 18 
ansion 8(83) 10 “eS 
anure 5(83); 11(7) 9 R 
Measles 15(53); 2(5); Pimply(33) 17 
i 3(83) 15 
Minnow 2(62); 18(16);_7(7) 11) 23 $ 
Moccasin 6(53); 11(13); Leathery(11) 8(5) 14 AR 
Moon 1(30); 1000); 307; 143); 8 21 z 
oss 13(52); 6(22); 2A an 12 2 
ous 2(54); 23(7); Gray 11 
Mustard 10078); 2604) 7 1 3 
Nede SRS: 20 BOTO lo AA 
‘ ; A AA 
ont 733): 24(20); 30012); Blue(24) i or ( 
Olive 13(62); 1(17); 19(7) 9 13 A 
Onion 5(49); 305): 2609: 1. 6) ol 16 
Oyster 18(46); 303); 10); „20: 3(8) 3 
Pail 12023); 1018); 30040: 7(10); 3¢ 12 5 
aste 37(64); 3(16); 18(7) ; 10 
Peach 33(31); 10020); 108): MUN: $9 n ar 
Pear 10(44); 14(14); 10); 46) 8 
Pearl 37); 1(33): 142); 25): 4l 16 13 
E SO 
E 2(46); 1(28); 8 
Pillow, 6(87); 3(5) P 9 i 
pa 16(55); 202); 17(10); 12(5) 23 A 
nine 5(44); 13(25); Tall(8) 19 15 
ineapple 1036); 14(22); 3906); 1(7) 
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TABLE 2 (continued) 


F, : mI 
pa es Categories and Per Cent Frequencies hoo ba 
Platter 1(38); 40(29); (10) 23 8 
Pollen 10(32); 2(30); 5(5) 35 6 
Pony 2(48); 11(14); 23(12) 27 32 
Pot 1(29); 12(22); 19(13); 30(9); 7(7) 20 47 
Puddle 24(61); 2(9) 30 3 
Pup 2(50); 6(12); 23(11) 27 6 
Rabbit 23(30); 3(25); 6(18); 2(6); Fast(10) 10 43 
Rattlesnake 9(34); 18(28); 32(7) 31 E] 
Rhinestone 7(67): 4(10) 24 EI 
Rice 3(54): 2(24); 4(6) 15 AA 
Rod 9(62): 38(9); 1(7): 4(7); 12(6) 10 44 
Salt 3(53); 26(10); 4(7); Grainy(5) 26 AA 
Sardine 5(30); 2(22); 18(20); Salty(5) 23 2 
Saucer 1(59); 40(19); _ 4(5); 3(5) 12 7 
Sauerkraut 22(41); 5(24); Stringy(17) 17 1 
Scissors 16(78); 12(11); 17(5) 5 8 
Seaweed 13(49); 18(28); 24(5); Stringy(11) 8 6 
Sewer 5(61); 21(10); 29(8); 30(5) 15 7 
Sheep 33(49); 3(23); 6(14) 13 A 
Silk 20(41); 6(39);  7(6); Slippery(5) 9 A 
Ski 9(38); 25(17); Slippery(9) 36 6 
Skin 6(42); 20(17);  3(9); Pink(9) 23 AA 
Skull 4(36); 3(22); 1(11); Bony(14) 17 12 
Skunk 5(78); 19(14) 8 13 
Snail 2(42): 18(18): 1(14); Slow(16) 10 8 
Snow 3(71); 31(14); 6(8) 8 AA 
Spear 16(68); 9(12); 17(12) 7 40 
Spinach 13(90) 10 8 
Spool 1(74); 2(9); 25(7) 10 7 
Stadium 8(78); 1(7) 15 2 
Stone 4(63); 2 (7); 1(6); 27(6) 19 AA 
Straw 10(39); 9(11); Prickly(7); Thin(7); Brittle(5) 32 41 
Sugar 14(82); 3(11) 1 AA 
Sulphur 5(48); 10(36) 16 21 
Tack 16(64); 17(16); 2(10) 10 9 
Tar 19(52); 37(35): Thick(5) 8 2 
Teeth 3(72); 16(8); 4(5) 15 A 
Telephone 19(65); 32(19) 16 A 
Thimble 2(37): 12(19): 7(15); 1(12); 4(9) 7 4 
Tobacco 11(46); 5(23); 22(6) 25 36 
Tomato 15(83); 6(7); 1(5) 5 i 
Tongue 15(48); 6(11); 24(9); 9(8); 39(8) 17 A 
Tunnel 21(54); 9(23); 1(6); 38(5) 12 22 
Turpentine 5(67); 24 (6); 36(6) 20 4 
Tweezer 12(26); 16(25); 2(17); 17(8); 7(6) 17 1 
Velvet 6(67); 20(24) 9 32 
Village 2(74) 26 AA 
Vinegar 22(68); 5(14) 17 12 
Waist 2(43): 1(24); 38(12) 20 33 
Walrus 8(46); 19(13); 18(13) 28 2 
Whale 8(77); 18(5) 18 8 
Wheel 1(94) 6 A 
8(32); 5(30); 32(7) 31 2 
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is the percentage of responses which 
were not sense impressions or if they 
were, did not constitute a category of 
5 per cent or more frequency. The 
last column, “Thorndike-Lorge,” 
gives the frequency count of each 
stimulus word as listed by these au- 
thors (1). The numbers indicate the 
frequency of occurrence per million 
words. Thus, an entry of 17 indicates 
that this word occurs 17 times per 
million. An entry of A indicates that 
the stimulus word occurs between 50 
and 100 times per million. An entry 
of AA indicates that the word occurs 
more than 100 times per million 
words. 


COMMENTS on Use oF MATERIALS 


Construction and presentation of 
lists of concepts. It can be seen by 
inspecting Table 2 that a number of 
stimuli elicited a particular category 
of responses. Thus, the response cate- 
gory “round” was prompted by the 
stimuli barrel, doughnut, cherry. dome, 
pearl, cabbage, and so on. The fre- 
aay of occurrence of a particular 
es ae of course, varies with the 
f imuli. In constructinga list of words 
,or concept learning, the essential 
idea is to use several stimuli which 
have elicited the same response cate- 


kee That is, one could use four 
Words eliciting responses in the 
round" category for the formation 


= one concept; four eliciting re- 
Ponses in the “red” category for an- 
we and so on, until the list is as 
Ong as desired. In our initial studies 
we have used 24 stimulus words from 
which are to be formed six concepts. 

he number of words used as a basis 


f 5 

or a given concept !S, of course, 

quite arbitrary; we have used four 
The 


ant certainly this could vary: 
ae words are randomized and 
Th order varied from trial to trial. 

e S may indicate his grouping © 


the stimuli in a number of ways. The 
simplest technique seems to be to ask 
him to name the characteristic which 
the four words have in common. The 
S responds with “round,” or “white,” 
or “large, depending upon the con- 
cept involved. The experimenter in- 
forms S after each response whether 
he was ‘‘right”’ or “wrong.” Our lists. 
have been presented at a 4-sec. per 
word rate; this gives time for S to 
respond, to be told “right” or 
“wrong,” and for the experimenter 
to write down the response given. 

“ Validation" of scaling. We have 
completed an experiment which, in a 
sense, validates the scaling. We 
tested the obvious assumption that 
the more frequent the response com- 
mon to the four words forming a con- 
cept, the more rapid the learning. We 
speak of this as dominance level; i.e., 
the higher the per cent frequency the 
higher the dominance level. For a 
given concept, we can use stimuli, 
providing several dominance levels. 
For example, we have used three 
levels of dominance for each of the 
following concepts: round, small, 
white, smelly, soft, and big. For white 
the stimulus words for three levels of 
dominance are as follows (the num- 
bers refer to per cent frequency with 
which “white” was given as a re- 


sponse) : 

High Medium Low 
Dominance Dominance Dominance 
Milk 83 Bone 34 Baseball 1 | 
Chalk 80 Collar 44 Fang 
Snow 71 Frost 34 Paste 16 
Teeth 172 Lint 38 Sugar 


Using three levels the results were 
as anticipated; the higher the domi- 
nance the more rapid the acquisition. 

Use in studying interference effects. 
We believe that using these materials 
a number of studies can be done on 
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intralist interference effects in con- 
cept learning. Variations in amount 
of interference can be produced by 
varying the number and strength of 
competing response tendencies. If 
different stimulus words elicit the 
same response or responses, and if 
these stimulus words are not ex- 
amples of the same concept, inter- 
ference will result. Since we know 
the number and strength of responses 
to each word, systematic variation in 
response competition can be accomp- 
lished. If Ss are “set” by instruc- 
tions for forming concepts based on 
sense impressions, the per cent fre- 
quency values should be accurate, 
hence relative interference strengths 
should be reasonably accurate. We 
also envisage studies on interlist inter- 
ference, working on transfer, retro- 
active and proactive inhibition. Also, 
hypotheses which relate interference 
effects to certain other variables 
(such as anxiety) may find some use 
for these materials. 

Some cautions. We do not as yet 
know all the pitfalls involved in using 
these materials for various purposes. 
But, we think it worthwhile to list 
two which seem to be important. 

1. It should be clear that the per 
cent frequency values do not neces- 
sarily represent the relative strengths 
of responses for a given S. We ob- 
tained only a single response from S 
for each word. We know, say, that to 
a given stimulus 50 per cent of Ss 
responded with “white” and 20 per 
cent with “round.” This does not 

necessarily mean that the 50 per cent 
who responded ‘‘white’’ would, if 
forced to give a second response, re- 
spond with “round” 20 per cent of 
the time. Our values give the per 
cent of Ss responding with a given 


descriptive word, and that is all they 
give. 

2. In constructing concept learn- 
ing lists there are many factors which 
may need equalizing in some way- 
For example, suppose we want to 
determine whether color concepts are 
learned more rapidly than some other 
kind. In constructing the lists we 
should keep dominance level equal, 
number and strength of inappropriate 
response tendencies equal, interfer- 
ences within each list, perhaps per- 
centage of miscellaneous responses; 
and so on. Indeed, we have foun 
that to form certain kinds of lists We 
do not have enough words among 
the 213 to accomplish what is neede 
by way of equalizing factors whic 
might confound the results. We may 
soon find it necessary to scale more 
words. 


SUMMARY 


We have presented the results of 
an attempt to develop materials for 
use in the study of verbal concept for 
mation. Our basic objective wa 
determine the frequency of respons? 
tendencies to certain verbal stimuli A 
If two or more words sometimes elicit 
the same responses, these words can 
be used as the basis for concept 
mation tasks. 

The words used were nouns., To 
each noun we obtained restrict 
associations. The restriction coi 
sisted of allowing S to respond 0°? a 
with sense impressions, such as ete 
shape, color, and so on. We hai 
given the percentage of frequency 53 
such responses to 213 nouns by `} 
Ss, and have discussed some ways in 
which these materials may be used 
concept-formation studies. 
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A DISTRIBUTION-FREE TEST 


OF ANALYSIS OF 


VARIANCE HYPOTHESES! 


KELLOGG V. WILSON 
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Rao (3, pp. 192-205) has shown 
that a chi-square statistic for a con- 
tingency table can be decomposed 
into components in much the same 
manner as a total sum-of-squares is 
decomposed in analysis of variance 
computations. By making a relative- 


ly simple modification of Rao’s tech-’ 


nique, it is possible to use this type 
of analysis in making a distribution- 
free (i.e., nonparametric) test of the 
hypotheses Concerning main effects 
and interaction ordinarily tested by 
a two-way (or two-factor) analysis 
of variance. 


Description of Test 


1. The median value, Md, for the 
entire set of n observations is deter- 
mined. This median should not be 
interpolated but should be deter- 
mined only as a “boundary” which 
divides the entire set of observations, 
as nearly as Possible, into two groups 
of equal size. In the remainder of this 
Paper, a will represent the number of 
observations greater than or equal 
to Md and m, will represent the num- 
ber of observations less than Md. 

2. A 2XrXc contingency table is 
set up where v is the number of row 
conditions in the experimental design 
and c is the number of column condi- 
tions. The third “dimension” in this 
table corresponds to the division of 
the scores by Md. Thus, the fre- 
quency, efi; will represent the num- 
ber of observation less than Md for 

the cell in row 7 and column j and 
afi; represents the number of obserya- 


1 This test was devised and first applied in 
the writer’s doctoral dissertation (4). 
? Now at Duke University. 
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tionsin this cell which are greater than 
or equal to Md. Obviously, 


N= ee fy and m= X Xe ije 
t f tj 


3. The total chi-square value, x1’, 
can be computed from equation [ta] 
below if the numbers of observations 
for each cell of the rXc experimental 
design, nij=afi;+ofi; are all equal 
and if na=ns=n/2. 


[1a] xr?=(4re/n) X, > (afy—n/2r0)? 
ij 


If nan, but all nij are equal, equa- 
tion [1b] below can be used. 


2 (afij—na/r6)? 
[1b] a x x [ na/rC 
lefi — 10/10) “| b 
E ni/ re 


Formula [tc] below is the mo 
general expression and can be use 
without restriction on na, n and the 
nij 
2 
pees 3 eien) 
FT NijNa/n 
2 
(Synim ] 
uT ina 


nimf n 


[1c] 


In all three formulas above, the ex- 
Pected frequencies (i.e., the terms 07 
the right of the Numerators) are 0 F 
tained from the null hypothesis tha 
the main effects and interaction en 
fects produce no change in the dis- 
tribution of scores, According to this 
hypothesis we should expect that the 
Proportion, 7/n, of the n; scores [oF 
each cell should be below Md and the 
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Proportion, 7,/7, should be above it. 
In all cases, xr* has (re—1) degrees 
of freedom. 

4. The chi-square values of the 
Tow effects, xr?, and the column ef- 
fects, xo, are computed using the 
Marginal totals of the 2XrXc con- 
tingency table. If na=n»=n/2 and 
all ni; are equal, equations [2a,] and 
[2as] below can be used. 


Pa] xr= (4r/n) Žo (ofi. —n/2r)? 
Where fi, = >D fn 


7 
Ral xo? (4¢/n) D> (of g—n/20)? 
i 


where f j= = af ij. 
i 


ay nan, but all 2; are equal, 
mulas (2b,] and [2b:] may be used. 


Xn2= (afi.—1o/7)? 
[2b] > L na/r 
(efi, — n/r)? 
n/r 
(af .j— naf c)? 
na/Cc 
fazno) 
gr n/c ] 


For 
eee [2c] and [2c:] are general 
ies and can be used without 
1ctlons on 7a, 2, and the mij. 


r; (afi. — ni. na/n)? 
XrR?= 
l2c:] 4 x [ 


[2b,] oon zÍ 


Ni Nan 
lofi. — ni. m/n)? 
i ni m/n 
Ly Ne, 
Where m= >; Nij 
i 


ae (af ;—1.j%a/n)* 
Xc°= 
lc.) 3 > [ 


n jna/n 
+ (of j—1.jno/n) :] 


n.jni/n 


where #4= >> nj. In all three pairs 


of formulas above, the expected fre- 
quencies for the main effects are ob- 
tained for the null hypothesis that 
the distributions of scores are identical 
for all levels of the row or column 
conditions. Thus, for xr?, we should 
expect the proportion, 7/7, of each 
ni. to be below Md and the propor- 
tion, 72/7, to be above it. In all 
cases, xr and xc” have (r—1) and 
(c—1) degrees of freedom, respective- 


y- 
5. The chi-square value for the 

interaction effect, x1’, is most easily 

computed by subtraction as in [3] 

below. 

[3] N= NTS XR XG, 

xı? has (r—1)(c—1) degrees of free- 

dom. 

The general expression for xr? is 
fairly complex and is given by Rao 
(3, p. 103) insomewhat different nota- 
tion. Recomputation of xt’, xr? and 
xc? would appear to provide a faster 
check than computation of xr? from 
Rao’s expression. 

The expected frequencies for Rao’s 
expression are obtained from the 
null hypothesis that the afi; and afi; 
for all cells can be predicted from 
appropriate marginal totals, in much 
the same manner as they are pre- 
dicted in a chi-square test of inde- 
pendence in a contingency table. In 
other words, the hypothesis states 
that the row and column effects are 
independent. Á 

6. The tests for the main effects 
and interaction are made by com- 
paring the obtained values of xr®, xo” 
and xr? with values from cumulative 
chi-square distribution for the ap- 
propriate degrees of freedom and sig- 


nificance level, œ 


Computational Example 
Suppose that we have an obtained 
distribution of error scores in dial 


98 KELLOGG V. WILSON 


reading for each of 16 subjects in 
each of 9 experimental groups. Also, 
let us suppose that three different 
dials, A, B, and C, and three different 
levels of illumination, l; 25, and 3; 
were used in all possible combinations 
so there is a total of nine conditions, 
each of which was presented to a 
different group. Let the median of 
the combined distributions, Md, be 
50 and suppose that we have the 
2X3 X3 contingency table in Table 1. 
Since all n;;=16 and 2,=n,= 12, 
formula [1a] can be used to compute 
xT. 
4rc/n=4-3-3/144=1/16, 
n/2rc=144/2-3-3=8 
xa°= 1/4[(2-8)2-++ (4-8)24 (5-8)? 
+(7-8)+ (9-8) (8-8)? 
+(10-8)2+ (13-8)2-4 (14-8) 9] 
=128/4 
= 32.000 
Formulas [2a] and [2a2] were used 
to compute xr? and xo. 
4r/n=4c/n=1/12, 
n/2c=n/20=144/2.3=24 
xr’ = 1/12[(11-24)24 (24-242 
+(37-24)?] 
= 338/12 
= 28.168 
xo*= 1/12(19-24)?+ (26-24)2 
+(27-24)2 
= 38/12 
=3.188 
Equation [3] was used to compute 
x. 
x1’ = 32.000—28.168—3.188 
= 0.664 


Since xr? has two degrees of freedom, 


TABLE 1 


CONTINGENCY TABLE For DIAL READING 
EXPERIMENT? 


Md=50 
Illumination Level 


A 
Dials | B 
G 


Md=50 
Illumination Level 
12 8 oh 
A 
Dials | B 
(ei 
of. 


the row effect is significant beyond 
the 0.1% level. xo? with two degrees 
of freedom, is significant only at to 
10% level and x1? is obviously ”° 
significant. 


Small Expected F requencies 


Sincer and c will berelatively small 
for most experiments (i.e., less t ‘il 
10), Pooling of cells to avoid ee 
expected frequencies would probab 
result in a serious loss of informatio’ 
In & recent article, Cochran i 
advises against pooling in any apk 0 
cations of the chi-square test and 2° 
concludes that the long accept? 
minimum expected values of betwen 
5 and 10 are too conservative. co 
ran also states that ordinary © n- 
Square tables can be used with C° 
tingency tables with more than oe 
degree of freedom where small @ 


3 dare 
__* The data in this table are fictitious a” 
intended for illustrative purposes only- 
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pected frequencies are relatively few 
(about one in five or more) and where 
the minimum expected frequency is 
as small as one. If the contingency 
table has 30 or more degrees of free- 
dom, an ordinary chi-square table 
can be used even if most expected 
frequencies are small and the mini- 
mum frequency is as small as two. 
Cochran gives references for exact 
tests for contingency tables where 
ordinary chi-square tables cannot be 
used. Also, Rao (3, pp. 201-205) 
gives exact tests which are relatively 
simple for small sample sizes. There- 
fore, it appears that small expected 
frequencies need not prevent use of 
the analytic technique described in 
in this paper. 


Relationship to Mood Tests 


Mood (2, ch. 16) discusses several 
extensions of the median test which 
are distribution-free tests of analysis 
of variance hypotheses. In his treat- 
ment of the two-way design with 
replication, Mood describes exact 
tests and chi-square approximate 
tests which are computationally simi- 
lar to those described in this paper. 
However, his test for interaction ef- 
fects is quite different and consists 
of making a series of iterative trans- 
formations of the scores until the 
Medians of the transformed scores 
are zero for all rows and columns. 
After this is done, the interaction 
hypothesis is tested by computing 
a chi-square test of independence for 
a 2XrXc contingency table whose 
third “dimension” corresponds to the 
differences in sign of the transformed 
Scores. This test is obviously tedious 
and Mood states, without further 
qualification, that it is “very nearly 
but not quite distribution free” (2, p- 
405). Thus, the test described in 
this paper appears to have consider- 
able advantages in the treatment of 
Interaction effects. 


Use of Test with Experimental Design 
with other than Two Factors 

À The analytic technique describe 
in this paper may easily be used wit 
a one-factor experimental design. | 
there are r conditions, a 2Xr cot 
tingency table may be set up and 

chi-square value for this table ca 
be computed as in equations [2a, 
[2b,] and [2c;]. An essentially equive 
lent test is described by Mood (2, 1 
398). If a two or more factor desig 
without replication is used, the mai 
effects of the design can be tested i 
this manner if the number of cond 
tions for the other main effects i 
sufficiently large. Otherwise, a 
exact test could be used. 

Extension of this test for use wit 
designs of three or more factors, wit 
replication, is also possible. The us 
of this test in analyzing a three-facte 
design, with replication, is also poss 
ble. The use of this test in analyzin 
a three-factor design, with replica 
tion, is described below. 

1. The over-all median, Md, fo 
the entire set of n scores is deter 
mined and a 2XrXcXb contingenc: 
table is set up where b is the numbe 
of blocks, set up in the same man 
ner as in the analysis of the two-facto 
design. In this table, ofije represent 
the number of observations in th 
cell for row į, column j and block 
which are less than Md and afi 
represents the number of observ: 
tions in this cell greater than or equ 
to Md. 

2. A total chi-square value, x 
can be computed from a relativel ! 
slight modification of equations [ta] 
[1b] or [1c]. The most general for | 
which is modified from [1c], is give: 


below. 
(afi — Nijkha/ N 
2= 
Mgr 2 x 2 [ Nijkta/ 1 
(of sie — ninn)” 


Nejene/ 


| where 2ij2=ofintofin xt? is dis- 
| tributed with (rbc—1) degrees of 


freedom. 
3. Chi-square values for the main 
H effects, xr?, xc?, and xz? are computed 
| in almost the same manner as for the 
+ two-factor analysis. The general 
i equation for xs? is given below in 
_ equation [5]. 


| i => eee 


xB = 
R, kha/ n 


13) 


100 


k 


4 (of. er) 


n, Ma/n 


H where n.2= >> ny and of tie 
LED, 
= DD fin xr and yo? may be 
# i 7 
; computed as above with appropriate 
changes in the subscripts, XR’, x0%, 
and xz? are distributed with (r—1), 
- (c—1) and (6—1) degrees of freedom, 
_ respectively, 
` 4. The total of the chi-square 
_ values for the four interaction effects, 


Ri xi’, can be computed by subtraction 
a8 In equation [6] below. 


6J XP =XT—Xr?— Xo? —yp?, 


f xr? is less than is required for 
Significance for the interaction effect 


of freedom, the analysis May be 


5 and 6 below. 
5. A2XbXc table, a 2x> 

-a2XrXc table are Sees aoc 
“ming the frequencies in the 2xrx¢ 
{Xb contingency table across rows 
columns, and blocks, respectively, 
A xr’ is computed for each of these 
tables by using formulas [ta], [1b] or 
[ic], with appropriate changes in 
subscripts, as in the two-factor analy- 
sis. Thus, a poxr* is obtained for the 
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2XbXc table which was summed 
across rows, a rgxrt? is obtained for 
the 2XrX6 .which was summed 
across columns and a rext? is ob- 
tained for the 2XrXc table which 
was summed across blocks. 

6. The chi-square values for the 
three double interaction effects are 
computed as in the equations for 
[7] below. The Pairs of main effects 
for which these interactions are being 
indicated by the left-hand sub- 
Scripts. 


BOXI"= poxt®— xp?— xo? 
[7] RBXI’=RBXT?—XR?— xp? 
RCX’ =R0XT?—xr?— xo? 


BcX1?, RBXI? and Rcx1’ are distributed 
with [(—1)(¢—1)}, [~—1)(—-1)} 
and [(r—1)(c~1)] degrees of free- 
om, respectively, 

- he chi-square value for the 
triple interaction effect, rnox1, is 
computed as in equation [8] below. 


[8]  Recx?= Xt noxr?— nx? —RoXx! 


Reoxr’ is distributed with [(r—1)(¢ 
—1)(-1)] degrees of freedom. 

e tests for the three main 
effects and the four interaction ef- 
ects are made by comparing the ob- 
tained chi-square values with the 
values obtained from a table of the 
cumulative chi-square distribution 
for the appropriate degrees of free- 
dom and significance level, a. 


à In such analyses, a 
total chi-square value is computed for 
a contingency table where all condi- 


` 2 
EEE ” ae 
~ a a u 
——S 
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values are computed for all possible 
tables of two or more main effects 
in the full contingency table. The 
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chi-square values for the interaction 
effects can then be computed by sub- 
traction. 
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A METHOD OF ACTUARIAL PATTERN ANALYSIS 


D. T. LYKKEN 
University or Minnesota 


A number of papers appearing 
during the last few years have con- 
sidered the problem of analyzing pat- 
terns or profiles of psychometric test 
scores. Several indices of ‘‘profile 
similarity” have been proposed (1, 
2, 4, 7, 8, 9). In a recent review, 
Cronbach and Gleser (2) show that 
these different indices may be all 
regarded as variants of the general 
Pythagorean formula for the linear 
distance between two points in n- 
dimensional space, where n is the 
number of scores in the profile, which 
itself is considered as a vector or point 
in the test space. 

In this same paper, the authors 
make a point which is of the utmost 
importance and which undercuts the 
whole of this literature and renders 
even their own conclusions, while 
correct, essentially irrelevant. They 
point out that “similarity is not a 
general quality. It is possible to discuss 
similarity only with respect to Specified 
dimensions” (italics in original) (2, p. 
| 457). A pattern of test scores, such 
asan MMPI profile or a Rorschach 
psychogram, is thought to embody 
predictive validity with respect to a 

considerable number of different psy- 
chological dimensions of the testee. 
Two individuals may be similar in 
some of these dimensions and wholly 
dissimilar in others; it is meaningless 
to speak of people or other complex 
entities as being “similar” without 
specifying some dimension(s) of com- 
parison. 

If one is interested in similarities 
between test patterns, conceived as 
geometrical configurations, the rele- 
vant dimension of comparison is in- 
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deed measured by Cronbach and 
Gleser’s D function. However, the 
psychologist is not interested in 
geometrical configurations but in 
people. If the MMPI profile em- 
bodies information concerning two 
independent psychological attributes, 
it is obvious that two such profiles, 
a given distance apart in the test 
space, may represent similarity with 
respect to one attribute and dissimi- 
larity with respect to the other. Only 
by remote coincidence would a meas- 
ure of the dimension of geometrical 
distance serve also as a metric for a 
psychological dimension, and a suit- 
able index of similarity for one at- 
tribute could not, by definition, meas- 
ure similarity in the other attribute 
if the two are truly independent. For 
the purposes of the psychologist, #9 
single index of profile similarity can 
be expected to have general utility nor iS 
it reasonable to expect that a meas- 
ure of geometrical similarity wil 
have any utility. 

Two examples. The motivation 
behind this recent interest in the 
analysis of test patterns is the con- 
viction of the clinician that not all 
of the psychological meaning of 4 
profile of test scores can be abstracte 
in any linear combination of them- 
That is, the clinician believes that 
certain psychological dimensions; 
which we shall call the criteria, are 
nonlinear joint functions of the 
several tests in the profile. This '§ 
the only valid basis for the current 
emphasis on patterns as such. G 
shall consider two simple examples 
of such nonlinear relationships a” 
illustrate (a) why measures of geomet- 
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rical distance are irrelevant to the 
problem, and (b) a general method for 
exploiting any nonlinear validity 
for estimating the criterion which 
may be in the profile. 

Meehl (7) has demonstrated that 
two “‘yes-no” test items having zero 
validity with respect to a dichoto- 
mous criterion, could have up to per- 
fect validity when considered jointly. 
This “paradox” of Meehl’s may be 
EN to be a special case, for di- 
G; aus distributions, of the func- 
tigh y=, where y is the (continu- 
ous) criterion and m1 and xz are the 
Continuous) predictors or test scores. 

he intercorrelations between the 
criterion and the tests individually 
may be zero and yet the criterion be 
entirely predictable from the scores 
Een together—from their ‘‘pro- 
a For convenience, we shall 
“ele all variables to be expressed 
rt andard scores. The test space in 

is example is the x:%2-plane. Sup- 
en the criterion, y, increases posi- 
oh y with increasing absolute values 
xı and xs in the (++) and (— —) 
pa anie of this plane, and negative- 
A with increasing absolute values of 
1 and a in the (+—) and (— +) 
‘hg shorn Then the profile (3, 3) 

Psychologically equivalent to the 
pons (—3, —3) but rather different 
ee (1, 1). Geometrically, however. 
ie patterns (3, 3) and (—3, —3) are 
G removed from one another and 

, 3) is much more similar to (1, 1)- 
oh only psychological criterion for 
x ich similarity to the reference pro- 
PS (3, 3) could be expressed by the 
o thagorean D function would be 
a wo the form yl = (11 — 3) + 

) . Thus, none of the indices now 
available have any relevance to the 
Psychological dimension of this ex- 
ample. 

Reconsider another instance. A 

Orschach expert might contend 
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that some psychological attribute, 
say, emotionality, could be estimated 
from two Rorschach indices taken 
together as a pattern. He might sug- 
gest that, when the first index, x1, is 
within average limits, the other 
index, xə, is proportional to emotion- 
ality. But when vı is above or below 
average values, then x: varies as the 
inverse of emotionality. That is, 
y= (x: for |x| <2; 1/x2 for |x| >2) 
+ Error of estimate. In this case, the 
pattern (3, 3) is psychologically 
equivalent to the pattern (-3, 3), 
whereas the geometrical distance be- 
tween these two points is consider- 
able. (3, 3) is psychologically very 
different from (1, 1) although the 
configurations are geometrically close 
together. Note further that an index 
that would be appropriate as a metric 
for the first example would not do 
at all here. The points (3, 3) and 
(-3, —3) which are equivalent for 
the function y=%1%2, are very dissimi- 
lar with respect to the criterion of 
this second example. 

It is the hypothesis of pattern 
analysis that psychological criteria 
can be validly estimated by nonlinear 
joint functions of sets of test scores. 
If several independent criteria can 
be estimated from the same set of 
scores, the joint function suitable 
for predicting one will not in general 
serve for predicting any other. This 
is the same as saying that any index 
which will measure similarity of (or 
psychological distance between) two 

given dimension will n( 
general give spurious results in com- 
paring the two persons (or profiles) 
on any other dimension. For the 
same reasons, of the indices thus far 
proposed (Cronbach and Gleser’s D, 
Cattell’s 7p, Du Mas’ fps; Kendall’s 
tau, Meehl’s difference index), all 
of which measure various aspects of 
the geometrical distance between pro- 


persons on a 
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files, none are of any general use for 
the estimation of psychological cri- 
teria from test score profiles or for 
measuring psychological distance be- 
tween profiles. 

A solution. A simple, general 
method does exist, however, which 
will do both of these jobs. Consider 
again the predictor space defined by 
the test axes x, and xz, expressed in 
standard scores. One wishes to pre- 
dict a psychological criterion, y, from 
the “profile” formed by a pair of 
scores on the two tests (or to measure 
similarity with respect to the dimen- 
sion y between given pairs of profiles, 
which is the same thing). To get 
the utmost predictive validity possi- 
ble, one allows y to be any joint 
function of x and x, ie, y=f(x, 
x2) +Error of estimate. One obtains 
a sample of W profiles with a value of 
y associated with each as measured 
by some external criterion, The V 


a b 


profiles are plotted as points in the 
test space. The test axes are drawn 
orthogonal and intersect at «,=%2=0 
(see Fig. 1). 

Suppose that the y values associ- 
ated with the profiles in each quad- 
rant of this space are averaged sepa- 
rately for each quadrant. If the func- 
tion f(21, xə) is in fact y=xx as in | 
the first example above, the y means 
for the (++) and (——) quadrants 
will be positive and approximately , 
equal. The y means for the (+°*)| | 
and (—+) quadrants will be nègã-| 
tive and about equal. One is now in 
a position to say that profiles in, 
€.g., quadrants (++) and (——) are 
similar with respect to y and those 
in quadrants (+-+) and (+-—) are 
dissimilar. Similarly, the best esti- 
mate for y for any profile in the (++) 
quadrant will be the mean y value 
found for that quadrant. 

Such estimates would of course be 
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| 


Fic. 1. RANK-ORDER oF Y MEANS IN THE 16 CELLS oF THE TEST SPACE: y 


= %1% 


extremely crude. To increase the 
Precision of measurement, one par- 
titions the test axes more finely, i.e., 
uses more cells in the test space. In 
the case represented in Figure 1, 
each axis has been partitioned at its 
paarile, yielding a space of 16 cells. 
gel the y means are calculated 
tor each cell. The numbers entered 
in the cells of Figure 1 are the rank 
a. orders of size of the y means which 
would result if the function involved 
1S Y=xX1%2. Now one can say, for 
alae that profiles in cells ad and 
alin equivalent with respect to y 
ine ess similar to those in cell be 
a Be Profiles in cell bd. Again, 
a est estimate of the value of y 
> the y mean of the cell in which the 
Aa Profile falls. If another psy- 
ie erca dimension, z, is to be esti- 
oa from the profile x1, x2, one 
4 ERE E es a separate table of cell 
Re oa na for z in the same way. The 
S of. predictability of the cri- 
is n trom the profile or set of scores 
& aov cniently indicated by the 
iS Istic multiple-eta (6) and signifi- 
nce tests are also available. 
le. a method can obviously be 
aie ized to any number, n, of 
>a fae If each test is partitioned into 
i ervals, there will be #” cells in 
i test space and &" entries in the 


SK 


ete Prediction table containing 
: plicaki means, The method is ap- 
E cont} e to categorical as well as 

! the nuous criteria, in which case 
4 antes frequent category of y oc- 
4 the id in a given cell is entered in 
MEA able instead of the y mean. The 
l oe od may be used with categorical 
data as well or even mixtures of 
i ER two. The categories of x1, in this 

€, serve as the intervals for %ı. 

1s method requires sufficient en- 
ies i each cell of the test space so 
tieda mean or modal y values ob- 
ie ite each cell will be stable. 
ere are k” cells in the space, 
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this means that an exceedingly large 
sample JV will be required to provide 
normative data for profiles composed 
of several tests if there are to be many 
intervals k, on each test axis. It can 
be shown, however (6), that with 
test reliabilities of the order presently 
available in psychological work, ex- 
tremely coarse partitioning (e.g., 
four, three, even two intervals per 
axis) introduces relatively little ad- 
ditional error of measurement. More- 
over, it usually happens that the 
tests in a profile are not independent 
of each other, in which case values 
will occur in many of the cells only 
rarely. These cells may be dis- 
regarded for practical purposes, thus 
considerably reducing the sample N 
required. 


A PROGRAM OF EMPIRICAL PAT- 
TERN ANALYSIS FOR THE 
MMPI 


To illustrate a practical clinical 
application of this method, we shall 
refer to the 9-scale profile of the 
MMPI. If one dichotomizes each 
scale axis, there result 512 cells in the 
test space. It will be convenient to 
number these cells from 0 to 511 and 
to code the profiles in such a way 
that the number of the cell to which 
a profile belongs is immediately given 
by its code. For scale values above 
the median, assign the code number J 
and for scale values below the median 
assign the code number 0. The codes 
for the 9 scales of the profile are 
written in the usual scale order; thus 
the profile 48, 58, 52, 63, 53, 72, 65, 
64, 53 would be coded 000101110, if 
60 were the median for each scale. 
This code may be read as a binary 
number, equal in this case to 46— 
this profile falls into cell 46 in the 
test space. 

Over a period of time, profiles could 
be collected, coded, and filed under 
the appropriate cell designation to- 
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gether with whatever psychological 
data are available on the subjects 
producing the profiles. As data ac- 
cumulate for the more popular cells, 
the psychological picture character- 
istic of these cells will emerge. An 
investigator interested in a particular 
dimension, y, can go to the files and 

~ compute y means for those cells for 
which sufficient data have accumu- 
lated. Multiple-eta (6) and the 
associated significance tests will tell 
him the extent to which MMPI 
profiles can predict that dimension; 
the table of cell means for y (or modes 
if y is qualitative) will be his pre- 
dicted values for y for new profiles 
belonging to these cells; profiles be- 
longing to cells having similar means 
he will regard as equivalent with re- 
| spect to y. If the psychological data 
| filed with the Profiles does not in- 
clude the y dimension, the investiga- 
tor must gather a new sample. The 
files will help him to the extent of 
informing him which cells are suffi- 
ciently popular to Warrant being sam- 
pled; e.g., if he wishes to predict y 
for college freshmen, he will ordinari- 
ly need data only on those cells in 
which profiles most commonly occur 
for this population. 
If it is desired to in 
cision of measurement, the number of 
intervals, k, on each scale axis may 
be increased from 2 to 3. However, 
with 9 scales in the Profile, this would 
result in over 19,000 cells in the test 
space. Although many—perhaps 
thousands—of the cells would be es- 
sentially empty in the population and 
could be ignored, this would still re- 
quire an impracticably large sample 
on which to base the prediction tables, 
One solution would be to reduce the 
number of scales considered, either 
by eliminating the less discriminat- 
ing scales or perhaps by using the 
first few principal component factors 
in their stead. Six scales with three 


crease the pre- 
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intervals on each would probably be 
a workable number. 

Another solution would be to re- 
main with the nine dichotomized 
scales and add as a tenth dimension 
a dichotomous index of general ele- 


9 
vation, such as ()-x)"2, coded J 


= 
if above the grand scale mean and 0 
if below it. This would double the 
number of cells in the test space an 
increase the precision of measure- 
ment considerably. (In the two- 
dimensional example above, where 
the quadrants formed the four cells 
of the test space, adding such an 
elevation index would amount tO 
superimposing a circle on the test 
axes with its center at the origin. Tr 
four enclosed sectors of the un 
would form four cells of the new tes 
space and the four quadrant areas 
outside the circle would make up the 
total of eight. In the MMPI tes 
space, there would 512 low-elevation 
cells within the elevation index hypi 
sphere and 512 high-elevation cells 
outside it.) f 
The method proposed requires tha 
Profiles falling within a given Cee 
be regarded as psychologically equiv" 
alent. At first glance, this would api 
pear to entail the same fallacy Ma 
veighed against in the first part of Ge 
paper; viz., the assumption that ri 
dimensions of psychological distanc 
can be ordered to a continuum ° 
geometrical distance. However, this 
is not really the case. First of all, i 
is clear that all the dimensions o 
question share the same zero pointi 
identical profiles are zero distanc? 
apart geometrically and psycholog 
cally. Error variance in the best 
composing a profile will cause the ° 


A 5 i 
served points to scatter in the tes” , 


Space around the point representin e 
the “true” profile; psychologica 
equivalent profiles will in general 
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observed to differ from each other by 
small amounts of geometrical distance. 
Thus, it can be seen that the D func- 
tion measures something of psycho- 
logical interest after all, namely, 
joint error of measurement in the 
profile. Therefore, the treatment of 
points within small regions of the 
test space as psychologically equiva- 
lent is not inconsistent with the argu- 
ments presented earlier. Moreover, 
error variance aside, only exceedingly 
complex functions will differ sharply 
between two adjacent small regions 
of the predictor space. One can only 
adopt the compromise approach of 
partitioning the test space as finely 
as is consonant with the reliabilities 
of the profiles and the available N. 


SUMMARY 


pa = The “similarity” of test profiles, 
bed he people producing them, can 
dim efined only in terms of specified 
ensions of comparison. 
bes An index of similarity which 
a ectly orders profiles along one di- 
Nsion of comparison, will not in 
oo pag be appropriate for measuring 
ny other, independent, dimension. 
=: Existing indices of profile simi- 
Ead order profiles as geometrical 
os along some dimension of dis- 
ce in the test space. Only by 
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remote coincidence will any such 
index serve as a measure of any psy- 
chological dimension related to the 
test pattern or as an index of simi- 
larity of persons producing the pro- 
files. 

4. The hypothesis of pattern 
analysis is that various psychological 
variables may be best estimated by 
nonlinear joint functions of the tests 
composing a given profile. The func- 
tion that will estimate one variable 
will not in general estimate any 
other. 

5. Multiple-eta provides a simple, 
general method for analyzing com- 
plex joint functional relationships. 
Basically a curve-fitting technique, 
it requires the gathering of normative 
data and may be thought of as an 
actuarial method of pattern analysis. 
With this method, criteria and test 
data may be continuous or categori- 
cal. The method makes possible pre- 
diction of the criterion from the pro- 
file, measures the degree and signifi- 
cance of this predictability, allows for 
the assessment of similarity and dis- 
similarity of profiles with respect to 
the criterion dimension. 

6. An illustrative possible applica- 
tion of this method to the MMPI in 
the form of a long-term cumulative 
research project is described. 
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REMARKS CONCERNING WILLERMAN’S PAPER ON 
KENDALL'S W AND SOCIOMETRIC-TYPE RANKING 
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F. KRÄUPL TAYLOR 


Bethlem Royal and Maudsley Hospital, Institute of Psychiatry, University of London 


The problem of adapting Kendall’s 
coefficient W of concordance to socio- 
metric-type ranking—or generally, 
to rank matrices in which the num- 
ber of rankers and the number of 
ranked are equal, and in which the 
principal diagonal is blank—had al- 
ready been considered by me in 1951 

2). 

: The adapted coefficient of concord- 
ance was designated W’, and the 
formula presented was 


12S’ 


Ye 
n*(n?—1) 
in which 
n? 
(n—2)? i 


This formula is identical with that 
given by Willerman if S’ is replaced 
by S. 

The advantage of using S’ instead 
of S in the formula for W’ becomes 

| apparent when it is desired to take 
tied rank scores into account which 
are frequently present in matrices 


of this type. In that case the formula 
1s Corrected to f 


1. WiLLERMAN, B. The adaptati, 
iN, B. Th ptation an 
Kendall 8 coefficient of concordance (W) 
to sociometric-type rankin $ 
Bull., 1955, 52, 132-133, © T Yeho. 
2. TAYLOR, F. K. Quantitative evaluation of 
psycho-social phenomena in small 
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W'= = 


1 
T w(n?—1)—n 2 T 


where 


1 
Tor Sy (8—1) 
t 


and ¢ denotes a set of ties in any one 
row. : 
This formula, though not its dert- 
vation, has recently been again pre- 
sented in a paper of mine (3). It has 
been found useful in the study ° 
small groups as an indicator of the 
reliability of the rankers and of the 
conspicuousness of the phenomena 
ranked. It was, for example, fou” 
that group members achieved $!8” 
nificantly higher concordance values 
when judging group companions wit 
regard to their dominance status tha” 
with regard to their popularity status 
he respective W’-values were + 
and .57 in 20 small groups examine": 
This difference was statistically #8 


nificant beyond the .01 level of com” 
dence, 
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THEORIES OF VISUAL ACUITY AND THEIR 
PHYSIOLOGICAL BASES 


JOHN L. FALK 
University of Illinois 


Tur PROBLEM OF VISUAL ACUITY geneous background and the gap the | 
Visual acuity is defined as the sole figure (the line to be discrimi- 
reciprocal of the minimum EA nated). Coincident with the above 
angle measured in minutes of arc. transformation will be an improve- 
_ The assumption is that absolute size ment in acuity or the ability to re- 
or distance of the test-object is not solve the stimulus figure. There are, — 
important, but only the angle it sub- in fact, a variety of ways for meas- 
tends at the eye. There are some uring visual acuity, more often than 
indications in the literature that not yielding results which are incom- 
| acuity may actually be a function of mensurable with one another. For 
x, distance even when the visual angle <; Pees 
remains constant, when the distances single line, 1t 1s somewhat less for 
cc less'than one meter or so (13, 72). hil 
Bb ie sa, a is explicable in ttn magne meas for two 
s $ 
dation. arying efficiency of accommO- bright bars is distinctly less well per- 
The most common acuity measur formed. 
are those dealing jer es pa eR The reader should be clear as to 
separable and the minimum oisihte. © TE difference between acuity and 
Minimum separable tests deal with sensitivity. Sensitivity refers to the 
the perception of a small gap between capacity of the organism to respond 
two parallel bars. Minimum ARBE Mee small values of photic intensity, 
tests present a ein le, fine line (or while by acuity js meant the capacity 
discrimination aes $ homogeneous to distinguish (resolve) very one oe 
ackground. Actually, no Ar very close demin I erae e 
distinction can be dr va between sufficient stimulus for the roc Fe p 
these two cat F fc ESE tors in terms of radiant energy e 
arge th Egonen] E -a lower than that associated with the 
um oe parallel bars of t os me cones: so that except for the red ent 
Point ee iain are ith this of the spectrum, where sensitivity 1S 
kind of test bet ith a minimum ee about equal, rods are more sensitive 
ible test: the b we tt homo- than cones. Chapanis (13) refers to 
i the bars have become experiments which indicate that mle 
1 The 4 a given level of dark adaptation al- 
Milner, Meci University pare re ~™ lows jight~of-a- certain. luminance to 
pnts : FOP a Ue perceived itsensitivity) it does not 
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insure the ability to discriminate 
forms at this luminance level 
(acuity). Also, under scotopic condi- 
tions the retinal region of maximum 
sensitivity does not correspond to 
` the region of maximum acuity. Later 
we shall consider certain structural 
features which contribute to acuity 
and sensitivity. 

Many factors have been shown to 
influence the degree of resolution: the 
luminance and size of the back- 
ground, surround luminance, wave- 
length, pupil-size, exposure time, 
retinal area stimulated, state of 
adaptation, etc. One is first inclined 
to attempt an explanation of acuity 
in terms of the projection of a geo- 
metrical image upon the retina. This 
interpretation must be rejected for 
two cardinal reasons. First, owing 
to its distortion the retinal image is 
far from being a pictorial replica of 
the visual world; and second, the 
retinal receptor mosaic, fine as it is 
in the central fovea, is not fine 
enough to account for the degree of 
visual resolution possible. However, 
as we shall Presently indicate, a finer 
receptor “grain” might not give rise 
to greater acuity at all. 

_ Let us discuss these factors of 
image blurredness and receptor size 
in somewhat greater detail. The 
image is brought to focus on the 
light-sensitive retina by the cornea 
and the delicately adjusted accommo- 
dation of the lens. The pupil responds 
to the luminance of the field of view 
by widening or Narrowing. At this 
point any analogy to a camera must 
be abandoned. There is a spread of 
light from the geometrical image of a 
bright object to its dark background 
(or into a dark object from a brighter 
background) owing to the diffraction 
of light by the pupillary aperture, 
as well as to the spherical and chro- 
matic aberrations. For a pupil size 
of 2.5 mm. or less, the spread of light 
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is attributable almost entirely to 
diffraction. For larger pupil sizes, 
the aberrations become important 
factors. Acuity has been found to 
increase up to a pupillary diameter 
of 2 mm., remain constant from 2 to 4 
mm., and then decrease owing to 
chromatic aberration. A 
There are still further respects in 
which the eye is quite unlike a cam- 
era. It is not light-tight, and allows 
light to pass through the sclerotic 
coat surrounding the cornea. There 
is a further diffraction and scattering 
of light in the internal media of the 
eye. Light must pass through retinal 
blood vessels, nerve fibers, and cell 
bodies in order to reach the rods ani 
cones. And there is reflection from 
the formed image on the retina to a 
other parts of the retina. Bartley 
(6, p. 58) claims that under many 
conditions “the level of stray illu- 
mination is a considerable fraction o 
the intensity of the image itself. 
Further, “with a single disk [as 
image], the major factor producing 
the electroretinogram is fluctuation 
in stray light. When stray light 1$ 
constant, the conditions for produc- 
ing a series of waves in the electro- 
retinogram are absent” (7, p. 926). 
The photographic plate has aP? 
even distribution of light-sensitive 
substance upon its surface, while the 
density of rods and cones varies &C-7 
cording to retinal region. There are 
not only regional variations in the 
type, size, and spacing of the retina 
receptors, but also the initial retina 
“grain” is substantially altered by 
various synaptic relationships (e8 
the funneling of a group of rods ont© 
one ganglion cell). Again, the imag®, 
the points of which are composed © 
small “blur circles,” is focused jee 
actly only for a small central regio™ 
the more peripheral areas of thé 
image being somewhat out of focus 
(17). For cone vision it has bee” 
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found that light passing into the eye 
obliquely at the edge of the pupil is 
a less effective stimulus than light 
entering through the center of the 
pupil. This directional sensitivity is 
termed the Stiles-Crawford effect 
and has been explained (57, 58) by 
considerations of cone shape, re- 
fractive index, and locus of photo- 
sensitive pigments. The adaptive 
range of the eye is over 1,000,000 to 
1, but pupil area varies only through 
eo y range. Clearly, adaptation 
Rs e explained by some additional 
eff ors. Also, the Stiles-Crawford 
Sete a shown that even this pupil 
A ge has less control over alterations 
; retinal illuminance than was for- 

merly assumed (54, p. 403). 
nent this complex picture of illumi- 
i. we may add the 
on that the eyes are in a state of 
bh stant fluttering motion termed 
oo nystagmus. This fine, 
re ire is to be distinguished 
ys the larger saccadic “flicks” and 
me drifts” (18). Physiological 
ae occurs since the eyeball is 
oe in balance between pairs of an- 
acai muscles, and unlike the 
7 movements, it is not binocu- 
of z coordinated. The measurements 
afl one amplitude do not agree too 
ni Re the various experiment- 
‘rs, but the effect itself is well estab- 

ished, 

he iis; despite accommodation of 
W ens for maximum acuity, We 
te that the image delivered to the 
mane is blurred and distorted by 
pil y factors: diffraction at the pu- 
i spherical and chromatic aberra- 
n, passage of extraneous light 
and cee the sclerotic coat, diffraction 
aio of light in the internal 
ia, the imposition of blood vessels 
ee tissue between incoming 
from i the receptors, reflection 
Baste e formed image, differential 
ies of receptor elements, lack 
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of sharp focus except at the image 
center, the Stiles-Crawford effect, 
and physiological nystagmus.” 

We must now consider the second 
factor arguing against the interpre- 
tation of acuity by the geometry of 
images. This is the set of structural 
characteristics of the retinal mosaic 
in the light of acuity performances. 
According to the geometrical inter- 
pretation, for resolution of the details 
of a pattern to occur, the image 
formed on the retina must not havea 
fineness of grain exceeding that ex- 
hibited by the diameter of the individ- 
ual retinal receptors. But Hecht 
and Mintz (41) found that a thin, 
dark line subtending 0.5” of visual 
arc could be resolved. This is roughly 
equivalent to seeing a 1/16-inch 
wire at a distance of half a mile. By 
the geometry of the situation, the 
image of this line (uncorrected for 
diffraction, etc.) would cover only 
about 1/40 of the diameter of the 
smallest foveal cones. The smallest 
cones in the fovea centralis are 1 to 
1.54 in diameter and therefore sub- 
tend a visual arc of about 12” to 18” 
which is considerably larger than the 
0.5” minimum visible found by Hecht 
and Mintz. How is it possible for the 
eye to discriminate details finer than 
its own mosaic? One reason is that 
the 0.5” geometrical image is a fic- 
tion; the sharpest edge will spread its 
plurred image over at least four or 
five foveal receptors. Also, it has 
been suggested that within certain 
2 Some additional factors producing dis- 
tortion of the retinal image are given by 
Tschermak-Seysenegg (76, pP: 7-10). Lest the 
amount and complexities of retinal image dis- 
tortion appear too great for fine resolution 
hould be noted that the eye 
chanisms which help in 
correcting the aberrations: the differential 


d density of the lens compensates 
yellow color 


reduces chromatic aberration by eliminating 


the ultraviol 
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limits, rate of firing may be “‘trans- 
posed ...into an ‘angular value,’ 
that is, interpreted dimensionally: 
low intensities as small, high inten- 
sities as large, angular sizes” (64, 
“p. 429). 

Summarizing, we may say that 
there are at least two problem para- 
doxes in the explanation of visual 
resolution. The image formed upon 
the retinal mosaic appears to be a 
rapidly vibrated blur. The mosaic 
itself, tight-packed with fine cells 
in the foveal region, is still too coarse 
geometrically to account for the 
fineness of visual acuity. However, 
in the conjunction of these two para- 
doxes investigators have seen various 
ways of solving both of them. These 
lines of thought will be discussed 
under the somewhat arbitrary head- 
ings of “static theories” and “dy- 
namic theories.” 

There are certain major factors in 
the problem of visual acuity which 
should be kept in mind, since it is 
interesting to see how the various 
theorists handle them: (a) the blurred 
character of the retinal image, (b) 
the continuous motion of the image 
due to tremor, (c) the diameter and 
spacing of the visual cells, (d) the 
modes of connection of the visual 
cells to the optic nerve, and the way 
in which the rest of the visual system 
is assumed to affect acuity, 


THE SIZE AND DISTRIBUTION 
OF RETINAL RECEPTORS 


This section is intended to give the 
reader the barest outline of anatomi- 
cal features in the retina required to 
comprehend the theories to be dis- 
cussed. Other material of this nature 
will be discussed in the context of 
the theory utilizing it. 

The structural limitations of visual 
resolution, aside from the blurring of 
the image, begin with the size and 
separation of the receptor’ elements 
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in the retina. Since rods are absent 
from the fovea centralis and are not 
implicated in photopic acuity they 
will be ignored in preference to the 
cones in this discussion. 

The thinnest cones are present in the very 
center of the fovea, where their thickness, in 
Man, is reduced to almost 1p, corresponding 
to a visual angle of approximately 12” to 15 
of arc.... The central territory, where the 
cones are almost uniformly thick, measures 
approximately 100 across, corresponding to 
20’...of arc... It contains approxi- 
mately 50 cones in a line. This area’seems to 
be not exactly circular but elliptical, with the 
long axis horizontal, and may contain alto- 
gether 2000 cones... here practically every 
cone is individually linked with the ganglion 
cells by means of its own “private” midget bi- 
polar... it represents an independent mo 
tional unit... the size of each of the 200 
Teceptor-conductor units measures, on the 
average, 24” of arc (64, pp. 425-26). 


At the edge of the fovea, the num- 
ber of cones to 1004 (20’ of arc) '§ 
reduced to 30, and their size increases 
to 40” of arc. This trend continues 
as we move further and further int? 
the periphery, the cones becoming 
larger and more sparse. 1 

The cones of the fovea are most y 
hexagonal, and “... in certain di- 
rections there are rows arranged 1" 
straight or nearly straight lines; 1” 
other directions this arrangement > 
less pronounced or is altogether @ 
sent. But even where regular, 5 
rows are straight on short stretche® 
only, soon becoming a tangle of rong 
a groups of cones oriented differ 
ently” (64, p. 427). A 

The cones are separated by mi 
partitions of neuroglia, the thoi 
ness of which is less than the diame 
ter of the thinnest cone. These a 
tions vary from 4y or 2u in the v 
center to łu (6” of visual arc) in 
periphery. t 

The fovea itself consists of a fon 
shallow pit in the retinal surface 
Primates. The particular visual PU 
Pose served by this arrangement 18 
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subject of debate among experts. For 
two divergent views see Polyak (64, 
pp. 209-210) and Walls (77, pp- 183- 
184). 

As might be suspected, the visual 
acuity of particular retinal areas is 
closely associated with the cone den- 
sity there (see 64, Fig. 99). Recent 
evidence (44, 57) indicates that our 
highest acuity is confined to a central 
region 35u in diameter (7' of arc). 
Just beyond this tiny region (26 
Cones across, containing a total of 
epbromtaustehy 600 cones) there is a 

per cent decrement is acuity. 


STATIC THEORIES 


Intensity Discrimination Theory 


(Hartridge) 
i Hartridge (35) presented details of 
is theory of visual acuity in 1922, 
and apparently still holds to the 
main outlines of it, although admit- 
Med that it was wrong in its quantita- 
ive aspects (36). The point of de- 
Parture consists in considering the 
acuity task to be a particular case of 


intensity discrimination. The blurred. 


ot image is regarded as a distri- 
ution of intensities. With a mini- 
an visible bright line as test-object, 
will cone at the center of the image 
th receive light of greater intensity 
This those cones on either side of it. 
S is threshold intensity difference 
e ulated by means of Ray- 
of i s equations, which take account 
e diffraction and aberration char- 
acteristics of light, and was found to 
approximate 10 per cent. Thus, the 
eye actually is considered to turn the 
lurredness of the retinal image to its 
advantage. In contrast to the geo- 
metrical interpretation of visual acu- 
ity, Hartridge’s theory posits that 
ie the diffraction pattern of an 
acini formed on the retina must not 
Te a differences of intensity which 
elow the threshold of difference 


perception of the retinal receptors” 
(36). As shown in Figure 1, the only 
requirement for resolution of the 
minimum separable gap is that the 
illuminance incident at the central 
cone row be perceptibly different 
ps that received by rows on either 
side. 


Intensity Discrimination Theory 


(Hecht) 


Hecht starts with the fact that 
minimum separable acuity varies 
with the intensity of illumination, 
and utilizing the analogy of the pho- 
tographic plate, states that the fine- 
ness of detail which the retina can 
register is dependent upon the density 
of active receiving elements. By 
combining this conception of the 
visual system with the assumption 
that the sensitivity of receptors is 
randomly distributed,* Hecht (39) is 
able to give an account of the increase 
in acuity with increased field lumi- 
nance. The integral of the Gaussian 
distribution curve is an S-shaped 
function which is fitted to Kénig’s 
intensity-acuity data. According to 
Hecht (38, 39) an increase in photic 
intensity will give rise to an increased 
number of retinal elements which are 
functional, thereby effecting a reduc- 
tion in the average center-to-center 
distance of active elements and in- 
creasing the resolving power of the 
surface. Visual acuity, then, ‘varies 
directly with the number of func- 
tional rods or cones in a unit area of 
illuminated retina” (38). 

Like Hartridge, Hecht considers 
the image on the retina as a pattern 
of intensities. If one row of cones is 
subjected to an intensity difference 
amounting to only 1 per cent with 
respect to adjacent rows, then this 
difference will be perceived. Since 

3 This is assumed for both rod and coni 
populations, although the rods are regarded a 
having lower thresholds in general. 
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only one cone row is differentially 
affected, the line (in the case of the 
minimum visible) will be sharp 
rather than fuzzy* (41). Also the 
intensity difference between the line 
and the background retinal illumi- 
nance is imposed on different ‘‘center 
rows” of cones in a short time interval 
owing to eye movements. 

Finally, Hecht and Mintz (41) 
note that an intensity difference of 1 
per cent is near the limit of differen- 
tial intensity discrimination. They 
suggest that the just resolvable visual 
angle varies with light intensity in 
the same fashion as the capacity to 
discriminate intensity differences var- 
ies with intensity. 

Critique of Hecht’s theory. In view 
of the wide influence which this 
theory has exerted upon current con- 
ceptions of visual acuity, it may be 

t From what we have already noted above 
concerning Polyak’s denial of the existence of 
straight “cone rows” it would seem necessary 


to explain the sharpness of fine resolution upon 
some other basis, 
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well to indicate some of the evidence 
which fails to support it.ê 

Byram (12) has pointed out that 
the equations which Rayleigh de- 
rived for the distribution of light in- 
tensity on the retina in the case of a 
long, black line on a uniformly bright 
field are valid only for a rectangular 
aperture. Since the pupil is circular, 
the calculations of Hecht and Mintz 
err by 15 per cent. Hartridge’s error 
amounts to some 200 per cent. 

Lythgoe (49) has shown that the 
classical, sigmoid intensity-acuity 
curve obtained by Kénig errs at both 
the high and low ends owing to 4 
failure to maintain a surround lumi- 
nance comparable to that of the test- 
object background. Lythgoe was 
able to reproduce Kénig’s results at 
the high end (a leveling off or slight 
falling off of acuity at high task 


ë The reader who desires additional critical 
material on Hecht's theory should consult t 


excellent review articles by Senders (70) a” 
Walls (78). 


BARS 


ILLUMINANCE 
DISTRIBUTION 


CONES 


Fic. 1. AN ILLUSTRATION or THE RETINAL BASIS oF Mix 
TION ACCORDING TO HARTRIDGE’s THEORY (Moni 


IMUM SEPARABLE DISCRIMINA” 
FIED FROM WALLS [78]) 


THEORIES OF VISUAL ACUITY 


luminances) with inadequate sur- 
round luminance; but when adequate 
surround luminance was maintained, 
visual acuity continued to increase 
with increasing luminance of the 
acuity task. Thus, any theory which 
views visual acuity as explicable 
solely in terms of brightness dis- 
crimination (e.g. Hecht's) is faced 
with the following problem: the dif- 
ferential brightness threshold ceases 
to improve at high brightnesses (and 
Perhaps deteriorates) while acuity 
continues to increase up to the high- 
est luminance value used by any ex- 
perimenter! Moon (54) describes a 
study by Eguchi in which acuity con- 
tinued to rise up to values of lumi- 
nosity twice as great as that of white 
paper in direct noon sunlight. Moon 
poring, further that “ . . . neither 
rid os nor Eguchi succeeded, at 
ill, highest values of [luminosity], in 
lu minating the surround to the 
minosity of the test-object back- 
ages ‘Thus we have no proof that 
ae relation may not 
inue upward considerably be- 
oe the point at which even the 
guchi tests indicate a bend” (54, 
P. 435). 
eo theory applies to Kénig's 
“a e which is flat-topped and does 
res Correctly represent the intensity- 
uity relation under more ade- 
aey controlled conditions. Like- 
ae at the low-intensity end of the 
beat e Lythgoe (49) obtained no flat- 
Vai ae a result which indicates 
Fad the sharp discontinuity of the 
hol and cone limbs of the curve may 
actually occur. 
ais assumption of a random dis- 
=i ution of thresholds for retinal 
rig has been called into ques- 
have mi many investigators. Few 
oe een willing to accept the view 
TE Sor thresholds vary SO widely. 
Sna : ight of the results of Lythgoe 
guchi one would be forced to 
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conceive of the retina as a sensitive 
surface in which the thresholds of 
some of its elements have never been 
reached. Fine as the retinal grain is, 
the theorist is hard put to explain 
the greater fineness of acuity per- 
formances, let alone ruling out a 
considerable part of the mosaic for 
having thresholds so high that they 
would seldom (if ever) be attained. 

Hecht has been criticized for all but 
disregarding an important dimension 
of neural action, namely, that the 
frequency of neural discharge varies 
with stimulus intensity. Although 
there is undoubtedly some variation 
among cone thresholds, the most 
likely candidate for the representa- 
tion of intensity level is the relative 
discharge rate passed on to optic 
nerve fibers rather than changes in 
the absolute number of receptors ex- 
cited (3, 33). The receptor responds 
in a graded rather than in the all-or- 
none fashion Hecht assumes. 

Actually, the assumption of widely 
separated thresholds in percipient- 
element populations gives rise to 
some rather peculiar implications. 
For example, consider the discrimina- 
tion of a small, dark spot on a moder- 
ately illuminated background. The 
dark area will produce an effect of 
illuminance decrement on a number 
of cones and the dark spot will be per- 
ceived. But according to Hecht’s 
theory there must be many such areas 
under moderate illumination condi- 
tions since the thresholds of many of 
the cones have not yet been reached. 
The homogeneous background would 
then appear as a rather spotty, 
blotchy expanse in which the retinal 
representation of the true dark spot 
would be indistinguishable from small 
clusters of elements with higher 
thresholds. Needless to say, such 
circumstances do not exist. 

The Hecht theory is unifactorial— 
only intensity is assumed to affect 
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acuity. Other factors influencing 
acuity performances have been neg- 
lected (e.g., size and shape of the 
test-object). The fact that increasing 
luminance gives rise to only a slight 
change in vernier acuity (aligning 
power) presents a further diff culty 
for the theory (1,9). Both Berger (8) 
and Ogle (61) note that with two 
point sources on a dark background 
the minimum angle of resolution in- 
creases (resolving power decreases) 
as the luminance of the points is in- 
creased. Ogle (61) has explored this 
effect using various background lumi- 
nances and concludes: “ . . | it would 
appear that Hecht’s theory for the in- 
crease of visual acuity with illumina- 
tion does not apply, because the 
MAR [minimum angle of resolution] 
was apparently a function of lumi- 
nance ratio or contrast only.” 
Finally, O’Brien and O’Brien (60) 
have subjected Hecht’s assumption of 
large, fixed threshold differences 
among receptor elements to a fairly 
direct test. Hecht's theory requires 
a variation among cone thresholds 
greater than one thousandfold, These 
Investigators, using a double star 
test-object, contrived to concentrate 
approximately three-quarters of the 
luminous flux of the star image upon 
only three foveal cones. Consider- 
ing diffraction, etc., this degree of 
concentration is Probably close to 
the maximum possible for acuity 
tests. At surround luminance from 
.01 to 100 foot-lamberts, “the star 


° Parenthetically, it is worth mentioning 
that visual acuity is a complex function of a 
host of conditions. We choose to view movies 
in the dark where our acuity is poorer only 
because the contrast conditions are so much 
better (48, p. 260). But on the other hand, 
Ogle (61) has shown that visual acuity de- 
leriorates with increased point luminance ona 
constant background (increased contrast) and 
improves with increased luminance of back- 
ground when the point luminances are con- 
stant (decreased contrast). 
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illumination necessary for visibility 
of both stars was never more than 
twice the illumination at which 
neither was visible. It is concluded 
that the small variation among foveal 
cones here observed is inconsistent 
with the Hecht theory.” 


DYNAMIC THEORIES 


The “dynamic” theories of visual 
acuity differ from the “static” theo- 
ries in one important respect: while 
Hartridge and Hecht give scant at- 
tention to eye movements, the physi- 
ological nystagmus (continuous, fine 
ocular tremor) is given an important 
tole in augmenting resolution per- 
formances by certain theorists. 


The Theory of Weymouth et al. ~ 


This theory is an attempt to €x- 
plain how vernier acuity operates 
with such remarkable sharpness an 
accuracy. Hair lines appear straight 
and clear despite the irregular orien- 
tation of “cone rows,” and vernier 
acuity is accurate to 2”, or somewhat 
less than one-tenth of the width of 4 
central foveal cone. Weymouth an 
his co-workers (4, 82) make use of tw? 
assumptions which were held by Her- 
ing: first, that each cone has a loca 
sign, and second, that physiologic@ 
nystagmus contributes to acut 
Actually, the second of these is 2° 
an assumption in the strict sense 107 
these investigators, since it is derive 
from some more basic aspects of ae 
Situation at the retina. If two adja- 
cent cones are stimulated they oo 
assumed mutually to affect eae 
other's local sign, the net effect being 
to shift the pooled local signs to som” 
intermediate position. Thus, nystas 
mus would give rise to a process s 
successive averaging, the product ° 
which is termed “retinal mean loc4 
sign.” f 
The following, then, is a picture my 
vernier resolution as it would be 2° 


~ na nhi „atii, ann > Se 


Me See 
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counted for by this theory. Any cone 

touched” by light is assumed to be 
stimulated (4). Consider an edge be- 
tween the light and dark portions of a 
visual field. The cones which fall 
along the retinal image of such an 
edge would form a staggered and ir- 
regularly spaced line, as photomicro- 
ag reveal. The mutual influence 
i. e local signs of such cones will 
z : to straighten this edge some- 
SO and the horizontal nystagmus 
a ements sweeping this diffraction 
endo wider strip of cones will 
ies z this dynamic averaging proc- 

ited ince equal numbers of stimu- 

a =a will come to lie in all 
diffra of the strip swept out by the 
er ction edge under the control of 
will eed the center of this strip 
of al epresent the “center of gravity” 
Neg l the stimulated cones. 

Bee average or mean of these points, which 
is thereto b lacal sign of the straight line, 
Conal distance pe ee to such units as inter- 
Curate to a smell 7s ae but may be ac- 
the mean ofan pepe of these units jae SS 
in inches ma = er of measurements made 

of an inch ai e accurate to a small fraction 
, italics mine). 


For the perception of the vernier 


ah to attain maximum accuracy 
ne must be long enough so that 
May ee large sample of cones 
ites ci averaged. It is a fact that 
tn foe. the line length up to a cer- 
Rebordta does aid vernier acuity. 
ivel a ng to this view, acuity 1s rela- 
Dace, independent of the size and 
ee of the retinal elements, the 
ata averaging process func- 
osai y producing a finer “grained 
ale for line representation an 
Snment., 


Th 
te Theory of Marshall and Talbot 


pe aim of these authors is to de- 
Upon s theory of visual acuity based 
Which actors of the nervous system 

ave presented embarrassıng 


a 
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complications to past theories rather 
than helpful mechanisms. This is the 
only current acuity theory which in- 
volves the anatomy of the visual pro- 
jection system up to Brodmann’s 
Area 17, as well as many neurophysi- 
ological phenomena ignored by pre- 
vious theories. Just as the blurring 
of the retinal image by diffraction, 
aberrations, etc. was utilized by the 
“static” theories to explain the ex- 
treme fineness of acuity, Marshall 
and Talbot (53) carry the process on 
several steps by taking account of 
various neurophysiological phenom- 
ena such as facilitation, summation, 
the neural recovery cycle, overlap 
and multiplication of pathways. 
First, let us consider two anatomi- 
cal factors of great importance to the 
theoretical presentation: reciprocal 
overlap and multiplication of path- 
way. The principle of reciprocal over- 
lap is easily grasped by relating the 
diagrammatic representation of Fig- 
ure 2 to the following description: 
“Tn the cat, optic tract endings in the 
geniculate divide into several 
branches and as many as 40 ring- 
shaped boutons have been seen on 
single radiation cells which may come 
from as many as 10 optic tract fibers” 
(53). Multiplication of pathway 
simply refers to the fact that in the 
monkey, for example, “the optic tract 
fibers seem to divide into 5 or 6 
branches, and each branch ends in a 
bouton which makes contact with a 
different radiation cell, resulting | in 
multiplication of the transmitting 
path” (53). Such multiplication, as 
the visual system ascends to the cor- 
tex, provides a probability-distribu- 
tion of cortical cells for each foveal 
cone. Each receptor, then, may acti- 
vate different paths to different corti- 
cal cells at different times. This ex- 
pansion or refinement of “grain” 
from retina to cortex is independent 
of the functional and anatomical 
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spread due to reciprocal overlap. 
While it is admitted that overlap is 
minimal for neurons ascending from 
the fovea centralis and is more char- 
acteristic of the periphery, it is never- 
theless maintained on both anatomi- 
cal (64, p. 430) and experimental (30) 
grounds that possibilities for such 
interaction exist. 

As the retinal receptors are vi- 
brated back and forth across the con- 
tours of the diffraction image of a 
test-object by the small-amplitude, 
high-frequency action of physiologi- 
cal nystagmus, they are subjected to 
rapidly changing illuminance values. 
Not only the amplitude of the illumi- 
nance change but also its rate of 
change will determine the magnitude 
of receptor outputs. The experiments 
of Hartline and Graham (33) on the 
photoreceptors of Limulus have 
shown rate of stimulus onset as well 
as intensity to be 
terminant of hi 


d but rapidly 
harge rate de- 
the stimulus. 
ould serve to 
discharge rate 
illuminance 
tor “rows” 


Fic, 2. A DIAGRAMMATIC REPRESEN 


IN THE LATERAL Grny 
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over frequently occurring cycles. 
Nystagmus is assumed to fluctuate 
“with approximately a statistical oc- 
currence of various amplitudes” (53). 
Thus the small illuminance difference 
presented by a hair-line test-object 
might be made liminal by the rapid- 
ity of its onset over sets of receptors 
which are stimulated intermittently, 
thus by-passing the effects of neural 
accommodation. As Marshall and 
Talbot (53) remark: “This essential 
discontinuity of stimulation is neces- 
sary for continuous vision through @ 
fatiguable system.” . 

We are now in a position to illus- 
trate how the above factors, in com- 
bination with the excitability changi 
of the neural recovery cycle, might 
work to produce certain gradients 0 
excitation which become progres 
sively refined and sharpened as they 
are passed on to higher age 
levels. Confining our discussion ms 
the retinal level for the present, W 
shall consider three factors whose 
salient features and modes of oper) 
tion have been described above: (4? 
the intensity distribution (cf. aimar 
tion, etc.) of the retinal image 
termed the diffraction image, (c) 
physiological nystagmus, and 
rate of illuminance change. iþu- 

In Figure 3 the intensity distr! 


OPTIC TRACT 
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Tue DISTRIBUTION OF ILLUMINANCE ON THE RETINA ACROSS THE GEOMETRICAL 


Boun 
NDARY (MN) SEPARATING LIGHT AND DARK HALVEs oF A FIELD (MODIFIED FROM JoNEs 


AND Hicarns) 


Fi of the diffraction image of a bor- 
oo between the light and dark sides 
a bipartite field is shown (curve 

i The geometrical border is MN. 
i figure is based upon one given by 
eg and Higgins (44) who apply 
oe the conceptions of Marshall 
ea albot in their | treatment of 
ini ographic granularity and graini- 
ka i B indicates a hypothetical 
a al cone distribution (a center-to- 
ia distance of 1.54 is assumed). 
Pt distance between X and Y repre- 
lea the best recent estimate (see 
= v) of the average amplitude of 
z ysiological nystagmus. X and Y 
nark the limits of tremor for cone a. 
Eeo the diffraction image of the 
oke rical border MN is spread 
‘ere e cones with the graded illumi- 
ce of curve A. Tremor XY will, 
Nee oa the cones with respect to 
to ao image, subject them 
Six erent rates of change of illumi- 
cones . ms example, if the bank of 
tight > own in Figure 3 moves to the 
@ will rom the position shown, cone 
chan Teceive the greatest rate of 
the in of illuminance since it lies on 
he es portion of curve A. Cone 
ea Secting a less steep portion of 
uminance distribution, will be 


subjected to a less rapid stimulus on- 
set, and consequently produce a 
somewhat lower neural discharge fre- 
quency. From an inspection of Fig- 
ure 3, it can be seen that the middle 
cones (a, b, —b) will receive the 
greatest over-all illuminance changes, 
with cone a, on the average, receiving 
the largest rate of illuminance change. 
Considering the complete cone bank 
B, the differences in both amplitude 
and rate of illuminance change aris- 
ing from ocular tremor will create a 
gradient of discharge frequency over 
the receptors which is peaked in the 
center. 

At the retinal level, then, the Mar- 
shall-Talbot theory provides one 
mechanism for the neural peaking of 
borders. Before passing on to higher 
synaptic levels, it should be noted 
that, unlike the Weymouth theory, 
receptor size plays an important role 
here. Cone size is one factor deter- 
mining the rate of illuminance 
change. As Marshall and Talbot (53) 
put it: “Smaller receptors would be 
useless, because though traversing 
the optical gradient oftener, they 
would gather proportionally less 
brightness differential. The limiting 
retinal factor in acuity seems to be 
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the relation of receptor width to the 
highest optical gradient in a moving 
pattern, rather than the average 
static differential illumination on one 
cone, compared with its neighbors.” 
The diffraction image, peaked at 
the retinal level by differential mag- 
nitudes and rates of illuminance 
change owing to the nature of the 
illuminance distribution and nystag- 
mus motion, is passed on to the lat- 
eral geniculate. By referring to Fig- 
ure 2, it can be seen that multiplica- 
tion of pathway and reciprocal over- 
lap will provide further peaking for 
submaximal reactions. If the system 
is not saturated the greatest reac- 
tion density will occur at the center 
of the reacting group, for it is here 
that both spatial and temporal sum- 
mation will be greatest owing to the 
peaked character of the in-coming 
discharge frequency pattern, Path- 
way multiplication and overlap serve 
to refine the mosaic “in Proportion to 
the sharper gradients and peaks pro- 
duced, as sand forms sharper peaks 
than bricks” (53). 
To complete the picture, we must 
consider yet another neurophysio- 
logical mechanism: the neural re- 
covery cycle. Observations on the re- 
covery cycle at the lateral geniculate 
of the cat show that the second of two 
stimuli of weak or moderate intensity 
“produces an enhanced Postsynaptic 
spike, during the first 10 to 30 
msec.... This mixture of super- 
normality, facilitation, recruitment, 
or summation is succeeded by a 
longer period of depression or sub- 
normality” (53). The recovery cycle 
functions “to break up neural ac- 
tivity into temporally discontinuous 
transmission,” and the summation 
possible during the supernormal pe- 
riod will serve to amplify the gradients 
oscillating at the lateral geniculate 
level. Utilizing the observations of 
Adler and Fliegelman (2) on physio- 
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logical nystagmus, Marshall and 
Talbot (53) state that the periods 
and amplitudes of this action “yield 
velocities of transit across the mosaic 
which stimulate the receptors dis- 
continuously, at intervals important 
in the neural recovery cycle.” 

To clarify the manner in which the 
recovery cycle at the lateral genicu- 
late might serve to emphasize inten- 
sity differentials at borders, consider 
what might possibly occur as con 
bank B (Fig. 4) oscillates back an 
forth. With the bank at the extreme 
right position (cone a at Y), cones b, 
and c have just recently given ie 
responses while cones —b and ~ 
have responded somewhat colle 
As bank B begins to shift to the le š 
on the second half of the cycle, t” 
geniculate correlates of cones b, po 
c will be responding to the onset é 
illuminance on cones b and c in t y 
supernormal portion of their recove™ 
cycle. Therefore, early in the sec 
half-cycle of nystagmus, the genia 
late correlates will show summat! Il- 
(peaking) according to the Marsha 
Talbot scheme. By the time be 
—b and ~c reach the steep port! e 
of A their geniculate correlates Wi the 
in the subnormal phase so that ine 
activity arising from these cones icu- 
dergoes suppression at the gente 
late. On other cycles, the genicu 
correlates of —b and —c will 


amplified while b and c are SUF, à 


Pressed. The correlate of a will po. 
to be somewhat suppressed CE 
pared to (b, c) and (—b, —4) S in? 
recruitment at the geniculate 15 g 

versely proportional to the stre” a 
of the conditioning stimulus, 2” ef 


‘on. 
receives, on the average, a stron? 


. . eee hati 
stimulus on its conditioning This 
cycle than b, c, —b, or —6 tivit! 


means that the gradients of ac 

at the borders of bar C will be $ 

ened and acuity enhanced. 4 pow 
It has already been indicate 


pat?” 
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Fic, 4 
on why Met: alge OF ILLUMINANCE (A) ON THE RETINA FoR A Bricut Bar (C) 
IELD. B1s A BANK OF CONES, XY THE LIMITS OF PHYSIOLOGICAL NYSTAGMUS 


Ae mechanism of pathway multi- 
hs and overlap in combination 
mul fluttering neural “image” 
a tend to produce ever sharpen- 
ia F -dieni at each synaptic level 
Mossi t and edges. The cortical 
w Ic is considered to be of a much 
aS ge than the retinal (see de- 
coe pathway multiplication 
et ME quantitatively the unit 
A near central vision should now 
ae not as lines, but as ex- 
elites ng cylinders whose ends bear 
i ea ratio of 1:10,000, and a cellu- 
k ratio of perhaps 1:100” (53). 
“a oa resolution may be explained 
onal receptor stimulation 
ning” « _by ocular tremor ‘“scan- 
Deals ye ultimately gives rise to 
mosai of activity in the finer cortical 
Beale e Since cortical excitation 
Sohle — upon a greatly expanded 
and” he after successive refinement 
eile. arpening at other „synaptic 
zation the accuracy of spatial locali- 
afford yall be more precise than that 
saic T by the coarser retinal mo- 
RA eciprocal overlap “..- while 
fra ening the base of a local reac- 
Ghat gece ae for small shifts of reac- 
rated” 53) if the system is not satu- 
Spatial 3). Thus, stimuli having a 
separation much smaller than 


the center-to-center distance be- 
tween adjacent foveal cones will be 
resolved; e.g., the center-to-center 
distance of cones at the fovea cen- 
tralis is approximately 20” while 
vernier acuity is of the order of 2”. 


An EVALUATION OF FACTORS 
INVOLVED IN THE MARSHALL- 
TALBOT THEORY 


Since various aspects of the Mar- 
shall-Talbot theory have been uti- 
lized to account for other visual phe- 
nomena besides acuity (see 37, 44, 
62, 63), and because it is the most 
comprehensive and sophisticated acu- 
ity theory to date, an evaluation of 
the major physiological mechanisms 
upon which the theory rests will be 
attempted. 

Physiological Nystagmus and Acuity 

The small-amplitude, high-fre- 
quency, involuntary flutter of the 
eyes during “steady” fixation plays 
an important role in dynamic theories 
and particularly in the Marshall-Tal- 
bot theory. Recent evidence on the 
characteristics of this motion will be 
described, and relevant aspects of the 
theory will be evaluated in the light 
of this material. Experiments at- 
tempting to trace the influence of 
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fixation flutter upon acuity will be 
discussed. . 
Adler and Fliegelman (2) describe 
physiological nystagmus as having a 
mean amplitude of 2’ 14” anda fre- 
quency of 50 to 100 per sec. It is 
these data which are used by both 
Marshall and Talbot (53) and Jones 
and Higgins (44) in their theoretical 
treatments. Ratliff and Riggs (66) 
recalculated the data presented by 
Adler and Fliegelman and arrived at 
a value of approximately 1’ for mean 
tremor amplitude. In their investiga- 
tion, Ratliff and Riggs (66) found the 
median tremor to be 17.5”, with a 
range from just perceptible move- 
ments to almost 2’ of arc. Tremors 
greater than 1’ were rare. The fre- 
quency ranged from 30 to 70 cycles 
per sec. Other measurements of phys- 
iological nystagmus by Barlow (5) 
and Ditchburn and Ginsborg (18) are 
in good agreement with these values. 
However, another recent investiga- 
tion by Higgins and Stultz (43) 
yielded a median tremor amplitude of 
1.0’, with a mean at 1.2’. In terms 
of Polyak’s histological description 
of the fovea centralis (see above), 
the difference between the results of 
Ratliff and Riggs (66) and Higgins 
and Stultz (44) is roughly equivalent 
to a flutter motion across one cone 
versus motion across four cones. In 
Figures 3 and 4 we have assumed a 
tremor motion across 4 cones, for 
illustrative Purposes, in accordance 
with the findings of Higgins and 
Stultz. This may or may not be an 
overestimate, but in any case, the 
combination of tremor, drift, and 
flick movements make it appear 
“unlikely that any point of an image 
can remain on a single receptor for 
more than a few hundredths of a 
sec.” (66). There do not seem to be 
any differences in the amplitude or 
frequency of tremor under changes in 
test-object shape (66), visibility, or 


JOHN L. FALK 


monocular versus binocular fixation 
(43). This squares with Walls’ (79) 
suggestion that the tremor is “due 
to proprioceptive feedback from the 
muscles, going no higher than the 
cranial-nerve nuclei themselves. 
Summarizing: Investigations show 
fairly good agreement on the fre- 
quency of ocular tremor, but the 
amplitude is certainly smaller than 
the 2’ 14” estimate of Adler and 
Fliegelman (2). Ratliff and Riggs 
(66), using a contact lens arrange- 
ment, obtained amplitude values less 
than one-third as large as those foun 
by Higgins and Stultz (43) who 
directly photographed the excursions 
of a small scleric blood vessel. ae 
measurements are technically difficu 
and the source of the experimenta” 
differences may lie in the type of re 
cording device utilized. # 
The effect of an over-all smaller 
tremor amplitude upon the Marsha 
Talbot theory is difficult to estimate 
Barlow (5) feels that his small amp 7 
tude results are evidence against t s 
theory, while Ditchburn and tae 
borg (18), with comparable finding! 
regard them as compatible with K g 
though not directly supporting) lat- 
Marshall-Talbot theory. These me 
ter investigators also note mat S 
tremor is highly irregular; Mara se 
and Talbot assume a statistical ve 
tribution of the amplitudes. We en 
already explained how they also g: 
sider the tremor characteristics» a 
described by Adler and Fliegel™ o 
(2), to be rather precisely relate lat- 
the neural recovery cycle at the pall 
eral geniculate. However, Mars ex- 
and Talbot do not work out any re, 
plicit, quantitative relations a 
and in another source (52) desc ich 
the relationship in a way pea 
seems to involve other eye a 
ments besides tremor. On the face “+ 
it, the posited mode of operatio” © 


this mechanism seems highly ' \ 
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Probable, for a statistical occurrence 
of both horizontal and vertical com- 
Ponents of tremor imposed upon 
drifting movements can scarcely 
mesh with the phases of the recovery 
cycle over the area thus involved. 
Before the various movements could 
average up” a sharpened set of 
gradients with the aid of recovery 
cycles, a large saccadic flick would 
‘sma: the crucial portions of the dif- 
Faction image to a new portion of the 
Cone mosaic. 
Mak ocular tremor is also made to 
e three other purposes: (a) The 
ement and peaking of gradients 
St ah about by physiological nys- 
EL ows interacting with reciprocal 
rt ap at higher levels (independ- 
seule of recovery cycle dynamics) 
Presumably still be operative 
Or tremor of 1’ amplitude. (b) 
it image flutter could still average 
enh aie precise “retinal mean local 
isal Over time. (c) Ocular tremor 
o] assigned the role of converting 
e spatial illuminance gradients of 
ilong fraction image into temporal 
te inance gradients on the recep- 
Provid fluttering motion would 
ie oe rapid rate changes in stimu- 
el Bih thus producing high 
` . Outputs and, as a result, bet- 
culty, 
fen has been no direct experi- 
sen test of a, but b and ¢ have 
and R to be investigated. _Riggs 
that atliff (68) describe experiments 
some appear to confirm the role of 
tien or „of mean local sign in 
S ular vision. Stereoscopic acuity 
, “Sures reveal that a difference of 
a arc between the images on the 
d be utilized as an accurate 
cordi; cue, yet median tremor ac- 
Some E „to these investigators 1s 
7.5" and is not binocularly 
Apparently the con- 
corresponding retinal 
be revised to include 


rcPtion of 
Points must 


~~ 
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processes of temporal integration 
producing a mean retinal location. 
Riggs and Ratliff (68) conclude “that 
both spatial and temporal patterns of 
impulses from the two eyes must 
somehow be combined centrally.” 

O'Brien and O’Brien (59) describe 
an experiment which purports to 
eliminate tremor as a factor in visual 
acuity. However, the cogency of the 
results has been justifiably ques- 
tioned by Teuber and Bender (75). 
In a later paper, O’Brien (57) is of 
the opinion that final judgment on 
the tremor-acuity relationship should 
be deferred. 

That the edges are more important 
determinants of acuity than the area 
of the test-object has been revealed 
by a number of investigators (10, 21, 
47). The relation of this fact to 
tremor-scanning of gradients is ob- 
vious. 

Ratliff (65) tested monocular 
acuity during a test-object exposure 
of 75 msec. in order to ascertain the 
effect of eye movements upon acuity. 
Drift movements during exposure of 
the test-object were found to be 
detrimental to acuity, and incorrect 
responses were significantly related 
to larger amplitudes of tremor. One 
might expect drift to decrease acuity 
by changing the location of the scan- 
ning operation of tremor, but the 
fact that larger amounts of tremor are 
associated with poorer acuity does 
not confirm the ‘‘dynamic’’ theories. 
This result is compatible with static, 
intensity-discrimination theories in 
which acuity arises from the simul- 
taneous differential responses of a 
set of adjacent receptors. But that 
the functionally immobile eye is not 
only not the most efficient condition 
for seeing, but is in general quite a 
poor one, has been recently demon- 
strated (69). It was not actually 
necessary to immobilize the eye. 
Instead, an optical system was de- 
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vised which provided for displace- 
ment of a vertical dark-line test- 
object in correspondence with any 
horizontal eye movement, so that the 
diffraction image would always lie 
upon the same retinal receptors. 
Under these “compensated” condi- 
tions, fine test-objects disappeared 
after a few seconds and failed to re- 
appear. With a viewing time of one 
minute, maintenance of resolution 
was shown to benefit when eye move- 
ments had their normal effect, and 
was especially aided by “exag- 
gerated” (the amplitude of Yimage 
motion due to eye movements was 
doubled by altering the optical sys- 
tem) eye movements. When short 
exposure times comparable to Rat- 
liff’s (65) value of 75 msec. were used 
results similar to his were obtained. 
Here, the “compensated” (no image 
movement) condition yielded the 
best visual Performance. With some- 
what longer exposure times, the 
normal and “exaggerated” conditions 
began to excel, and it is suggested 
that these conditions Operate to 
“shift the acuity task from one set 
of receptors to another in rapid suc- 
cession so that not all of the receptors 
at any one time have achieved a 
stationary state. . - eye movements 
are bad for acuity but good for Over- 
coming the logs of vision due to 
uniform stimulation of the retinal 
receptors” (69, italics mine). 

The investigation described above 
(69) is in many respects crucial to 
the question of the role of ocular 
tremor in acuity. The finding that 
with short exposures the “compen- 
sated” image was best for acuity is 
counter to the Marshall-Talbot 
theory. However, the rapid fading 
of fine lines under “compensation” 
with longer viewing time seems to 
indicate that once the retinal “on- 
response” (32) to the presentation 
of the test-object is over it will dis- 
appear unless similar responses are 
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continuously evoked. Under normal 


viewing conditions, the fact that the — 


“on” of short exposure, when coupled 
with the “on-responses”’ associated 
with diffraction image movement (due 
to tremor), should give rise to poorer 
acuity might be attributable to some 
“smearing” or confounding of the 
spatio-temporal relationships by the 
two simultaneous on-processes. Eye 
movements certainly produce sri 
rior acuity performances when a 
lowed to operate over an appreciable 
period of time. Therefore, it seems 
quite possible that high acuity 1S 
normally obtained and maintained by 
“on-off responses,” which dun 
prolonged viewing are continuous Y 
evoked by means of ocular pre 
Under normal viewing conditio 
high short-exposure acuity might of 
obtained from the ‘‘on-response ine 
stimulus onset and fail to be matt 
tained by tremor due to ‘amnear ta 
as explained above. This view of ffers 
Processes producing acuity di 69) 
from that given by Riggs et al. ( ye 
quoted above. We consider © 
movements to be always an at g 
acuity (since they provide the n ee 
sary temporal illuminance gradier 
unless simultaneously confour ma 
with other “on-response” paon ne 
effects (e.g., flash exposure © ” ji 
test-object). By ‘“on-respons? off” 
meant the on-component of on This 
optic nerve fiber activity (32). a iD 
will be given further consideratio 
a later section, tance 
It is possible to test the impor i 
of “on-off responses” for acuis Oe 
precisely controlled way by Eai 
flickering test-object under han 
pensated” conditions. With et: a 
arrangement we would predict f dis- 
fine-line test-object would no í. 
appear. This is just what OC 
but there are complications. 


curs: 


at 
™Tom N. Cornsweet. Personal co 
cation, Feb. 7, 1955. 
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slow flicker in the neighborhood of 
one cycle per sec. permits continued 
seeing of the test-object with no fad- 
ing out, but there is a rapid decre- 
ment in the percentage of time for 
which the test-object is seen when 
flicker .frequency is greater than 
about 1.5 cps. Electrophysiological 
studies on the cat indicate that the 
channel capacity for “on-off re- 
Sponses’’ in the retinal ganglia (19) 
and evoked cortical potentials (56) 
's much higher than 1 cps, and ap- 
Proximates the flicker fusion fre- 
oe If resolution is related to the 
( Ivation of “on-off” mechanisms 
ber we have suggested) then it is 
= neult to account for an optimum 
Seeing condition around 1 cps with 
‘ie tga oo and marked dete- 
cians with higher frequencies 
K ic are well within the capacities 
mA on-off” mechanisms. Perhaps 

Supernormal period of the lateral 
ie is the important factor 
ae ther piece of experimental evi- 
tees seems to militate against this 
4 eee (71) found that when 
Deets light source was used the 
fhe ay luminance required for 
on i visual resolution to occur 
it ess than would be required by 

e Talbot-Plateau law. Since ‘‘com- 
oe, conditions were not used 
Guat eye tremor presumably oc- 
pl ed, and flicker “on-off respones” 
A us tremor-induced “on-off re- 
oo should have been con- 
aes according to the view pre- 
aie ed above. It is interesting to 
Ht e that Senders’ (71) enhancement 

resolution occurs at flicker rates 
Where Cornsweet, using the “com- 


Pensated” condition, finds a decre- 
Ment, 


Evidence on Reciprocal Overlap and 
Multiplication of Pathway 


WiLbe role of reciprocal overlap and 
ultiplication of pathway in provid- 
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ing the anatomical basis for visual 
acuity is central to the Marshall- 
Talbot theory. They cite evidence 
of overlap in the lateral geniculate 
of the cat and pathway multiplica- 
tion in the monkey (see above). 

With respect to the possibility of 
overlap, the situation at the fovea 
is described by Polyak (64) as fol- 
lows: 

While it is possible and even probable that 
under special dynamical conditions the pure 
cone system [i.e., single cone—single midget 
bipolar—single midget ganglion, the “private 
line” system characteristic of the central 
region of the fovea] functions independently, 
it is likely that the same system, even in the 
foveal center, is rarely activated without some 
participation of the diffuse neurons, such as 
brush and flat bipolars. This widening of the 
“roadbed” for the centripetal impulses in- 
evitably blurs, to some degree, the distinct- 
ness of the photoreceptor processes, and in 
this way suppresses somewhat the effect of the 
clear-cut barriers separating the individual 
cones from one another (64, p. 431). 


However, it should be noted that 
Polyak claims the midget synapses 
themselves do not overlap recipro- 
cally. 

We have already distinguished be- 
tween sensitivity and acuity, the one 
referring to luminance threshold 
phenomena, the other to resolution 
of a test-object. Resolution depends 
among other things, the diam- 


upon, dic 
eter and spacing of the percipient 
elements. But also involved is the 


fact that beyond a small area at the 
center of the fovea groups of visual 
receptors tend to converge upon the 
same bipolar, and many bipolars 
funnel onto the same ganglion cell. 
The rods especially converge in this 
fashion, many of them using the same 
neural pathway. Where this mode of 
connection occurs it effects a coarser 
“neural grain” and thus reduces re- 
solving power, but it enhances 
spatial summation which increases 
sensitivity (32). 

Whatever the situation is at the 
retina, the great reduction in path 
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in illuminance. “Off” fibers are 
apparently rare in mammals, and 
mammalian “‘on-ofi’’ responses un- 
dergo complex transformations as the 
parameters of stimulation are altered. 
Granit states that the cone system is 
organized for the interpretation of 
changes in the visual field, and that 
the off-effect in “on-off” fibers plays 
a major role since it has a shorter 
latency than the on-effect and can 
be quickly inhibited (27, p. 167). 
The off-effect may be inhibited by 
re-illumination, the inhibition oc- 
curring with a shorter latency than 
the new ‘‘on’’-response (27, pp. 90- 
92). Further experimental work by 
Granit (29) has shown that the on- 
and off-components of “on-off” ele- 
ments are mutually inhibitory. Con- 
sidered along with ocular tremor, 
such functional antagonism would 
provide a high degree of sharpening 
of image borders. 

Hartline found that it was possible 
to map a receptive field on the retina 
for single fibers. Illumination any- 
where within a tiny retinal area 1 
mm. or so in diameter would produce 
a response in the fiber. The receptive 
field was of fixed type, being of the 
“on,” “off,” or “on-off?” variety, and 
the fields of different fibers were ob- 
served to overlap one aother. A 
recent investigation by Kuffler (45, 
46) on the retina of the cat yielded a 
rather more complicated arrange- 
ment. Receptive fields were found to 
contain three zones in concentric 
arrangement which allowed “on,” 
“off,” or “on-off” discharges to be 
obtained from any one ganglion cell, 


There exists a central area of low threshold 
as tested by a small spot of light. The dis- 
charge pattern of the central region is the op- 
posite of that found in the periphery or sur- 
round. The center may give predominantly 
“off,” the surround “on” discharges, or the 
reverse. An intermediate region gives “on-off” 
discharges (46). 


The type of discharge recorded at 
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the ganglion cell depended upon a 
variety of factors such as background 
illuminance, intensity, duration, size 
and location of the exploring spot- 
stimulus within the receptive field. 
But perhaps of primary importance 
for the problem of visual acuity were 
the mutual modifications operating 
between the zones within each recep- 
tive field. Zonal interaction wae 
studied by means of simultaneously 
exciting the central and margina 
zones with small spot-stimuli. Stimu- 
lation of an “on” center field at the 
center and at the surround (“0 
zone) showed that center and enig 
round tended to suppress one on 
other. “Off” center fields operate a 
an analogous fashion. Field centa 
tend to be dominant, but if stimu T 
tion of the surround is more intens 
they are inhibited. Thus, “By 
center responses are modulated 
inhibitory surround activity 4 $ 
produce a large variety of discharg 
atterns. 
$ There are numerous other compiles 
effects, not to mention the over a 
of receptive fields. Our parpe 
merely to indicate how zoning O in 
receptive field might function out 
acuity tasks. Kuffler (46) points e 
that with this antagonistic ara 
ment of central and surround itt 
Within receptive fields, a slight “tee 
of the eye can produce a great ahan ks 
in the discharge pattern. He eae, 
that ocular scanning moveme er- 
“should be advantageous in the Ta 
ception of contrast-and in acuity’ 
If we consider an “on” center re 3: 
moving from left to right in Figu und 
it is clear that as the “off” ae il- 
Moves into the region where t nter 
luminance gradient is steep the Ce 
will be inhibited. 
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half of the “off” surround transects 
the gradient. Such modes of opera- 
tion in the retina would seem to be 
well suited for the production of 
sharp contours, while field overlap 
would refine “retinal local signs.”? 
However, certain static stimulus 
factors may also bring about the for- 
mation of well-defined edges. Gra- 
ham and Granit (26) have shown 
that in the fovea there is inhibition 
of the more weakly activated area 
when differential stimulation is ap- 
plied to two areas. Likewise, Hart- 
line et al. (34) observed that even in 
the compound lateral eye of Limulus, 
’ brightly illuminated areas of the eye 
inhibit the activity of dimly lighted 
regions more than the latter inhibit 
the activity of the former.” In both 
Cases contrast would be enhanced. 
‘he experimental data on conditions 
yielding best acuity performances 
confirm this relationship. Acuity 
deteriorates at surround luminances 
8reater than the luminance of the 
test-object background (13), pre- 
sumably due to suppression of the 
area to be resolved. Fisher (20) has 
ee that increasing the size of a 
immer surround enhances acuity. 
Cope are still other aspects to 
in i itory processes and the ways 
x which they may be instrumental 
ees resolution. _ That acuity 
is uld increase with light intensity 
inc bins paradoxical, for under 
rat conditions greater spatial sum- 
i ion should operate to increase 
ose blurredness. Polyak (64, P- 
N suggests that increases in in- 
i sity not only give rise to an 
Ncreased response in the cones, but 
eg spread inhibitory influences to 
rounding areas via the horizontal 
a amacrine cells. This collateral 
ce pression is said to limit the areal 
ects of local stimulation and im- 


9 my 
SA ae it is well to remember that the cat has 
rift Tue fovea and is a nocturnal animal sac- 
cing acuity for sensitivitv. 


prove acuity. There would probably 
be at least two factors operative in 
the production of such suppression. 
On the one hand, there is the contrast 
elicited by differential stimulation, 
as explained above; lateral inhibition 
makes for higher acuity. The finding 
(50) that inhomogeneous back- 
grounds yield greater acuity per- 
formances than homogeneous grounds 
with the same test-object might be 
due to the generalized inhibition of 
spatial summation induced by in- 
homogeneous background stimuli. 
On the other hand, it has been sug- 
gested by some investigators (16, 
49) that light adaptation itself some- 
how inhibits the action of collateral 
channels, allowing the foveal cones 
to behave as single units. Willmer 
(84) has investigated the influence 
of luminance upon the summation 
area as revealed by field sizes above 
which subjective brightnesses re- 
main constant. With increases in 
luminance above threshold levels the 
summation areas were found to de- 
crease, “indicating that polysynap- 
tic bipolar and nerve cells probably 
become less important; this may be 
one of the factors in increasing visual 
acuity with higher illuminances. . - - Pl 

The fact that central flicker inter- 
action may occur between two light 
patches without the disruption of 
fine-line discrimination in the inter- 
space (30) indicates that the retina 
is capable of minutely differentiated 
responses. In this case, there appears 
to be interaction across an area in 
which collateral suppression must 
be manifest. 

Mention should be made of the re- 
cent work of Motokawa (55) on the 
“feld of retinal induction” as it 
bears on the problem of acuity. Us- 
ing an electrostimulus method (de- 
d in 22), fields of induction sur- 


scribe c i 
rounding retinal images 1m the 
Field 


human eye were mapped. 
strength was strongest at figure 
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borders, and for optical illusions was 
distributed in ways which were re- 
lated to the characteristic perceptual 
phenomena. The Landolt ring and 
vernier-type acuity test-objects were 
analyzed, the vernier task inducing 
the more marked field deformation. 
This result is said to account for the 
much higher acuity measures ob- 
tained with this figure. 

Although acuity as a function of 
the color illumination has been ex- 
plored somewhat, the factor of chro- 

_ matic contrast as a basis for resolu- 
_ tion has been almost entirely neg- 
| lected and is given no place in any 
` theory of acuity. Although it is not 
strictly an acuity task, discrimina- 
tion of the numbers on the pseudo- 
isochromatic plates is entirely a wave- 
length discrimination, luminance con- 
~ trast cues being eliminated. As Walls 
| (78) puts the case: 

When at last someone determines visual 


` acuities for colored targets on colored grounds 
of equal brightness, the advocates of bright- 


ness discrimination as the “sole” basis of 
resolution will be given new food for thought. 


The aim of this section has been 
not primarily to argue for any specific 
modes of retinal interaction, but 
rather to indicate certain fairly com- 
plex mechanisms which might play 
important roles in resolution. It is 
felt that the Marshall-Talbot theory, 
as it stands, takes insufficient ac- 
count of the acuity-producing actions 
within the retina proper. One must 
keep in mind that the visual system 
cannot resolve stimulus differentials 
unless they are embodied in differ- 
ential retinal responses which may be 
channeled through the optic abbot 
temporal sequences if not spatia 
distributions. In view of the cain 
Presented on the lateral genicula j 
nucleus (see above), it may be Hai 
to seek for the major peaking factor 
at the retinal level with refinemen 
of “grain” perhaps occurring here 
at Area 17, and beyond. 
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Papers by Sidman (8), Hayes (4), 
and Merrill (6) have raised serious 
questions about the validity of in- 
ferences from curves of functional 
relationship based on averaged data. 
By means of mathematical arguments 


| and numerical illustrations, these 
writers have shown convincingly 
that “ ... given a mean curve, the 


form of the individual curves is not 
uniquely specified” (8, p. 268). This 
demonstration strikes close to home 
for the learning theorist. In the 
study of learning, we are interested 
in describing behavioral changes in 
individuals, but owing to limited con- 
trol over behavioral variability must 
frequently depend upon averages 
for groups of organisms to determine 


| 
ja 
| functional relationships. In many 
= we could scarcely remain in 


business if it were actually true that 
“,.. the mean curve does not pro- 
vide the information necessary to 
< make statements concerning the func- 

tion for the individual” (8, p. 268). 
| Unfortunately it is true. More ac- 

curately, it is true if we regard the 
| mean curve solely as a source of in- 

ductive generalizations. This qualifi- 
| cation suggests that possibly the fault 

lies, not in the averaged curves, but 
| in our customary interpretations of 
them. 

It is noteworthy that learning 
theory, even quantitative learning 
theory, has made rather steady prog- 
ress in spite of the widespread ac- 
ceptance of a false methodological 
assumption. Apparently inferences 
from averaged curves, although not 
necessarily correct, must in fact often 
beso. This being the case, researchers 
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in learning are unlikely to give up 
readily the habit of computing mean 
curves of functional relationship. 
My purpose in this note is to show 
that we need not feel obliged to try- 
The group curve will remain one 0 
our most useful devices both for sum- 
marizing information and for theoret- 
ical analysis provided only that it is 
handled with a modicum of tact an 
understanding. K 
The principal point to be made 1S 
that the valid treatment of average 
curves depends upon the same prin- 
ciples of statistical inference that 
have become faimilar to all of us 1" 
such cases as the analysis of variance 
and the chi-square test. Just as any 
mean score for a group of organisms 
could have arisen from sampling 
any of an infinite variety of popula- 
tions of scores, so also could any 
given mean curve have arisen from 
any of an infinite variety of popula 
tions of individual curves. Therefore 
no “inductive” inference from mean 
curve to individual curve is possible, 
and the uncritical use of mean curves 
even for such purposes as determi” 
ing the effect of an experiment? 
treatment upon rate of learning s 
rate of extinction is attended by ra 
siderable risk. These consideratione 
set rather severe limitations upo” K 
use of mean curves in the sindy 
learning. Nonetheless we can an à 
ipate that, as so regularly turns ° 
to be the case in scientific regea e s 
our virtue in accepting these limita 
tions will not go unrewarded. hat 
same type of theoretical inquiry t for 
has led to recognition of the need ay 
caution in handlingaveraged data™ 


eee 7 


INFERENCE FROM CURVES BASED ON GROUP DATA 


be turned in a constructive direction 
and lead to more effective exploita- 
tion of the one defensible and im- 
Portant theoretical application that 
remains for the averaged curve—the 
testing of exact hypotheses about 
individual functions. 

The first step in this direction is to 
Tecognize that the effects of averag- 
ing are not in any way capricious or 
unpredictable and need not be re- 
garded as artifacts or distortions. 
Distortion arises only if unwarranted 
inferences are drawn from the mean 
Curves. But given any specified as- 
sumption about the form of in- 
dividual functions, we can proceed to 
deduce the characteristics to be 
expected of an averaged curve and 
then to test these predictions against 
obtained data. As in any problem of 
Statistical inference, it will always be 
true that other assumptions might 
yield the same predictions. The task 
undertaken will be, however, to test, 
lot the infinity of possible hypoth- 
eses, but only the one hypothesis 
under consideration. 

In testing quantitative theories 
against averaged data we may be con- 
rened either (a) with the form of a 
unctional relationship or (b) with 
A n i values for the population 
ill Organisms sampled. Case a is 

strated by the formerly popular 
pastime of trying to determine “the 
Nes of the learning curve” or by 

e attempts to verify Hull’s hypoth- 
esis that habit strength is an ex- 
Ponential function of number of 
reinforcements (5). Case b is illus- 
trated by attempts to determine 
Whether the slope parameter of the 

abit growth curve depends upon 
amount of reinforcement (11) oF 
Whether the rate and asymptote of 
Maze learning are functions of stim- 
ulus variability (9). 

In studies involving Case 4, it 

as been customary to operate on the 
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tacit assumption that the form of a 
mean curve will reflect faithfully the 
form of the individual curves. Since 
this assumption is now recognized to 
be unwarranted, we can no longer 
expect averaged data to yield any 
direct answer to the question, ‘‘What 
is the form of the individual func- 
tion?” We can, however, replace this 
question with one which can be an- 
swered, namely, “Is the form of the 
mean empirical curve in accord with 
the assumption that the individual 
functions are of a given form, say 
=f a bya e )?” (In the remain- 
der of the discussion we shall repre- 
sent by f the function relating a de- 
pendent variable y to an independent 
variable x and parameters a, b, etc.) 
It becomes a specific mathematical 
or statistical research problem to 
determine for any given function f 
what testable predictions can be 
made concerning the mean curve for 
a group of organisms. Some pre- 
liminary considerations that may be 
helpful in dealing with this type of 
problem will be discussed below. 

In studies involving Case b the as- 
sumption has frequently been made 
that if the function obtained for the 
individual organism is y=f(x, @, 
b, - ++), then the function describing 
the mean curve for a group of or- 


ganisms should be y=f(x, 4 b, °° > oF 
i.e., a curve of the same form with 


parameters equal to the means of the 
corresponding individual parameters. 
Since the assumption is not generally 
true, the treatment of this case will 
require, first, recognizing the in- 
stances in which the assumption 
holds, and, second, investigating in- 
stances in which it does not hold in 
order to determine what information 
about parameter values is obtainable 
from the mean curve. 


CLASSIFICATION OF FUNCTIONS 
Relative to these problems, the 


i 
| 
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mathematical functions that we will 
have occasion to deal with can be 
classified into three types, each call- 
ing for somewhat different treatment. 
Let us consider briefly the problems 
that will arise in dealing with each of 
these types and illustrate some of the 
procedures that will prove useful in 
dealing with them. 

Class A. Functions unmodified by 
averaging. In these cases the mean 
curve for the group has the form of 
the individual function and the pa- 
rameters of the mean curve are simply 
the means of the corresponding in- 
dividual parameters. The chief prob- 
lem here is that of defining the class of 
functions so that we will recognize 
instances of it. The essential charac- 
teristics of the class will be apparent 
from consideration of a few examples: 


1. y=a+bx 
2. y=at+butcex? 

3. y=a log x 

4. y=asin x+b cos x 
5. y=a/x. 


A numerical illustration involving 
one of these examples will show in a 
concrete way how the averaging 
process works out for this type of 
function. Suppose that we have 
two organisms whose behavior in a 
learning situation is described by the 
function y=a log x, where a is a 
constant which varies in value from 
one organism to another, but remains 
fixed in value throughout learning 
for any one organism. Let y; and 
ya be response measures for the two 
organisms, and let the value of a be 
1 for the first organism and 2 for the 
second. Then the course of learning 
for the two organisms will be de- 
scribed by the equations 


yi=log x 


and 
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y2=2 log x, respectively. 


Now we compute the “empirical” 
response measures for each organism 
for the first four values of the inde- 
pendent variable x as indicated in 
Table 1. Then by averaging the two 
response measures at each value of 


TABLE 1 


EFFECT OF AVERAGING A SIMPLE 
LOGARITHMIC FUNCTION 


1.5 

g lgx y Be) ï log ¥ 
a ee O O a 

1 .00 .00 .00 .00 -00 

2 .30 30 60 .45 -45 

3 48 48 96 72 -12 

4 .60 .60 1.20 .90 -90 


x, we obtain the mean “empirical” 
curve represented by the values 1" 
the column headed f. It is cleat 
however, that the column of mea 
values also represents the values ° 
the function = 1.5 log x. Therefore 
the function describing the mea” 
curve is of the same form as t@ 
individual functions, and the param” 
eter of the function describing. the 
mean curveis the mean of the indivi" 
ual parameters. i 

All functions belonging tO this 
class work out similarly. Stated H 
the simplest terms, what they ê 
have in common is that each parad 
meter in the function appears either 
alone or as a coefficient multiplying 
a quantity which depends only G 
the independent variable x. 
aging, any quantity of the 
sort factors out at each valu 
and appears in the mean 
multiplying the mean value O 
parameter. $ 

Class B. Functions for which apera, 
ing complicates the interpretation 
parameters but leaves form uncha qer is 
Examples of functions falling 1" i 
class? are 


eo 
curve: 
f the 


1 See Mathematical Note 1. 
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1. y=log bx 
i 
2. y=—+—: 


a ax 


In the first example, we can re- 
write the function in the form 


y=log b+log x; 


then it is apparent that the mean 
a for a group of organisms which 
iffer with respect to parameter b 
will be logarithmic in form, for the 
Same reasons discussed in the pre- 
Sing section, but will have the 
‘am value of log b rather than log 
aad the intercept constant. Thus 
on | a mean empirical curve, we can 
= a an estimate of the geometric 
ne an of the parameter b for the 
ame sampled, but no estimate 
the arithmetic mean of b. 
be the second example, the mean 
e e of y vs. 1/x will be linear, but 
will paramere" of the mean curve 
ey e the mean values of 1/a and 
ee or the organisms sampled, so no 
$ mate of @ or 5 can be obtained 
Tom the averaged data. 
_ The testing of hypotheses involv- 
ing functions in this class raises no 
difficulties if we are interested only 
in the form of the function; if we wish 
to estimate mean parameter values 
Oor to test hypotheses involving 
changes in parameter values as @ 
penn of experimental treatments, 
‘Aen care must be taken to allow for 
the effects of averaging. 
‘ Class C. Functions modified in form 
Y averaging. A function will fall in 
this class? if it contains any terms 
involving the independent variable 
¥ which will not factor out when we 
sum values of y over a group of 
organisms for a constant value of x. 
rhe most familiar example of a func- 
tion belonging to this class is the 
growth” curve 


3 See Mathematical Note 2. 
See Mathematical Note 3. 


y=a+be-= 


encountered in some guise or other in 
many learning theories, and given 
detailed discussion in Sidman’s pa- 
per (8). 

In some cases, a function belonging 
to this class can be moved into Class 
B or even Class A by means of an 
appropriate transformation. Take, 
for example, the exponential function 
given above. If the value of the pa- 
rameter @ is known for all individuals, 
it can be subtracted from the re- 
sponse measure y, leaving us with the 
simpler equation 


y'=y— a=b. 


The latter can be made more trac- 
table by the logarithmic transforma- 


tion 
log y'=l0g b— cx 


which when averaged yields 
E(log y')=E(log b)—éx, 


where E() represents the mean, Or 
expected, value of the term in paren- 
theses. If, then, we take logarithms 
(base e) of the dependent variable 
y’ and plot the transformed variable 
as a function of x, both the curve for 
any individual and the averaged 
curve for a group will be linear; from 
the mean curve We can obtain esti- 
mates of the mean value of the param- 
eter c and of the geometric mean of 
the parameter b. By means of this 
strategem the problem of testing the 
hypothesis that an exponential func- 
tion holds for individual organisms 
has been reduced to the very simple 
problem of determining whether the 
mean curve plotted from the trans- 

departs significantly 


formed data : 

from linearity. Similarly, other 
hypotheses that might be tested 
against the group data are greatly 
simplified. Suppose, for example, 
that a theoretical curve of extinction 


took the form of this exponential 
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function, with y being a response 
measure, x number of trials, and 
the asymptote a equal to zero, and 
that we were interested in the ques- 
tion whether some difference in the 
experimental treatments given two 
groups of organisms influenced rate 
of extinction; by means of the sug- 
gested transformation, this prob- 
lem would reduce to that of testing 
for a difference in slope between two 
regression lines. A variety of trans- 
formations which may be useful in 
situations of this sort have been 
discussed by Mueller (7). 

Even when functions in Class C 
cannot be moved into one of the more 
docile classes by any available trans- 
formation, or when for some reason 
transformation of the data is un- 
desirable (as might be the case if a 
contemplated transformation pro- 
duced heterogeneity of variances 
along the curve), we are not neces- 
sarily helpless. The extent to which 
functional form is modified by aver- 
aging will generally depend upon the 
dispersion of parameter values in 
the group of organisms sampled; thus 
1n some cases it may be Possible by 
studying individual curves to esti- 
mate the dispersion of Parameter 
values in the group and determine 
whether the form of the mean curve 
can be expected to conform closely 
to the form of the individual func- 
tions; see, egy (3). Further, even 
in the case of the most refractory 
functions, it will usually be possible 
by appropriate mathematical analy- 
sis to derive the main characteristics 
that should be predicted for an aver- 
aged curve; an analysis of this sort 
for a “growth” function has been de- 
scribed in a recent paper (2), 


THE ROLE oF EXPERIMENTAL 
ERROR 


The analysis given here might be 
objected to on the grounds that we 
have considered only the effects of 


averaging upon data obtained from 
idealized organisms which behave 
strictly in accordance with theoretical 
functions. Response measures ob- 
tained from real organisms may, on 
the other hand, be influenced by 
various sources of experimental error 
as well as by the variables taken ac- 
count of in a given theory. The ob- 
jection is pertinent, but not fatal. 
The answer is that in testing a theoret- 
ical prediction one must make some 
explicit assumption about the role of 
experimental error in the test situa- 
tion. And as in any statistical test, 
the validity of the conclusions will 
be conditional upon the degree to 
which such assumptions are satisfied. 
In some instances, it may be rea- 
sonable to assume that the contribu- 
tion of experimental error is negli- 
gible; then the analyses given above 
will apply without modification. Fre- 
quently it will be more reasonable to 
operate under the assumption, rou- 
tinely made in working with analysis- 
of-variance models, that error com- 
bines additively with treatment ef- 
fects to determine the observed re- 
sponse measures. In this case, if We 
wish to test the hypothesis that 4 
function y=f(x, a, b, - + + ) holds for 
individuals, we will assume that the 
observed response measure Y for any 
individual is equal to the sum of Y 
and a random variable e which rep- 
resents the contribution of experi- 
mental error, i.e., 


V=yte=/(x, a, b, ++ +)+e 


Now if the error variable e is mge 
pendent of x, and if the function 
falls in our Class A, averaging © 
individual curves will yield a mea? 
curve described by the function 


Pa=jte=f(q, a,b, ++ -)+e 


P ich 
If the mean value of e is zero, eed 
will, for example, be the case whe is 
ever the distribution of errors 


pa l A 
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normal, then the form of the mean 
curve will be unaffected by the error 
term; if the mean is not zero, then the 
mean function will be modified only 
by the addition of a constant and the 
plotted mean curve will be changed 
only by a vertical displacement. In 
some cases the error variable may 
interact with experimental variables. 
If the nature of the interaction can be 
Stated explicitly, then its effects upon 
the averaging process can be deter- 
mined by appropriate analysis. In 
Situations where error variables and 
experimental variables interact in 
complex or unknown ways, exact 
tests of quantitative hypotheses will 
generally be impossible. 


SUMMARY 


These comments are not meant to 
Provide an exhaustive treatment of 
the problem of averaging. The one 
Point I have tried to bring out clearly 
is that the valid interpretation of 
8roup curves! depends on the prin- 
Ciples common to all problems of 
Statistical inference. Although the 
form of a group mean curve does not 
determine the forms of the individual 
Curves, it does provide a means of 
testing exact hypotheses about them. 
n each particular case, the procedure 
Must be to state explicitly the hy- 


: Throughout this discussion we have 
Spoken in terms of mean curves obtained from 
Kps of organisms. Similar problems arise, 
; similar considerations apply, however, in 
re case of a curve whose points represent 
eee of repeated measures on the same 

"ganism. Parameter values associated with 
Fop individual organism may vary either sys- 
ematically or randomly during the course of 
ân experiment. In either case, we may think 
Sel €ach possible combination of parameter 
: ues as determining a hypothetical curve, 
enh Population of curves being sampled at 
Wh value of the independent variable. 
ek ether the obtained mean curve should be 
t pected to have the same form as the hypo- 
meal individual curves will depend on the 
k ure of the mathematical function describ- 
oe the latter and on the role of experimental 

Tor, just as in the case of a group curve. 


pothesis under test, and then to derive 
the properties that should hold for 
the averaged curve if the hypothesis 
is correct. If the predictions thus 
derived are in accord with data, the 
hypothesis remains tenable; if they 
are not, then the hypothesis can be 
rejected at some specified level of 
confidence. Utilized within this 
framework, the averaged curve can 
be expected to remain one of the 
most valuable techniques for the 
analysis of behavioral data, and in 
fact to increase progressively in 
value as mathematical and statistical 
research continue to enlarge our rep- 
ertory of special devices for the 
handling of particular problems. 


MATHEMATICAL NOTES 


1. A more formal criterion for 
class inclusion is desirable for some 
purposes, and may be formulated as 
follows.’ Let us consider a function 
y=f(x, a, 6,---). At any given 
value of x, we may regard y as a 
function of the parameters a, b, etc., 
and expand the function in a Taylor's 
series around the mean values of the 
parameters (6, 10), obtaining the 
relation 


y=f(x, ã, b, +++ )+(Aalfot (A0) 


Aa)? 


where @+Aa is the value of the a 
parameter for a given organism; Far 


5 A criterion proposed by Bakan (1), which 
involves expanding the function in a Mac- 
laurin series around the point x=0, is not 
entirely satisfactory. For one thing it is fre- 
quently inapplicable. Take, for example, the 
functions y=a log x or y=%°; 1n neither case 
are the derivatives all continuous at x=0, so 
in neither case will the series generally repre- 
sent the function. The criterion suggested in 
the present paper will hold for all functions 
which can be expanded by Taylor’s theorem, 
a class which includes all the elementary 
functions and, in fact, all explicit functions 
that the psychologist is apt to have dealings 


with. 
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represents the ith derivative of y with 
respect to a, evaluated at a=4; and 
soon. When the function is averaged 
over a group of individuals, we obtain 


F=S(x, G, b, ++ + )+400%fa 
+4oe%fet ++: 


Our criterion for inclusion of a func- 
tion in Class A may now be stated: 
if in the Taylor's series development, 
all second and higher order partial 
derivatives of the function with re- 
spect to parameters are zero, then 
the function is unmodified by aver- 
aging. Applying the criterion to 
y=a log x, we have f,=log *3f2=0; 
and therefore ¥=4 log x, in agree- 
ment with the conclusion reached 
above by a more informal route. 

2. A sufficient criterion for inclu- 
sion of a function y=f(x, a, Di tuine) 
in Class B is that it does not satisfy 
the criterion of Class A when ex- 
panded around 4, 3, etc., but does 
satisfy that criterion when rewritten 
y=f(e, u, v,-+-) and expanded 
around #, 3, etc. (u, v, etc. being 
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functions of the parameters a, b, 
-). In the first example under 


Class B above, this criterion is satis-_ 


fied if we let log b=; in the second 
example, it is satisfied if we let 1/a=” 
and b/a =v. 

3. If a function falls in Class C, 
then in the Taylor’s series develop- 
ments described above, some of the 
second or higher order derivatives 
will depend on x regardless of how 
t, u, etc. are chosen, and thus tte 
criteria for Class A or Class B canno 
be satisfied. | 

It will be noted that these torma 
criteria provide more rigorous de A 
nitions of the various classes tha 
can be given in nonmathematice! 
terms. However, it should be on 
phasized that the conclusions abona 
inference from averaged curves t 4 
we have reached in this paper do ne 
depend on abstruse mathematics 
analyses. In many practical situ : 
tions, questions concerning the ee 
of averaging can be handled hes: 
simple numerical methods of the tyP 
illustrated in an earlier section. 
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The purpose of this paper is to 
extend methods presently available 
for use in the analysis and the com- 
parison of curves obtained in psy- 
chological experiments. The reader 
will be assumed to have a general 
acquaintanceship with analysis-of- 
variance techniques at an elementary 
level, such as could be obtained from 
most of the better available text- 
books, The present paper will deal 
with the analysis of curves that result 
when the difference between experi- 
mental treatments involves a scaled 
independent variable with equal steps 
(or equal logarithmic steps) between 
levels of the independent variable. 
Examples would be learning curves, 
extinction curves, dark adaptation 
curves, time-to-attain a level of dark 
adaptation as a function of the lumi- 
nance of the preadaptation field, re- 
sponse rate as a function of the 
amount of reinforcement, etc. The 
Procedures apply to most curves or 
sets of curves in which y, the de- 
pendent variable, is plotted as a 
function of x, the independent vari- 
E where there are equal intervals 

etween the values of x. It isassumed 
that there is random normal varia- 
tion in the dependent variable at 
each data point with equal error 
variances. There may or may not be 
Parameters expressing qualitative or 
quantitative variation of additional 
independent variables that are orthog- 
onal to x. 


l * This paper was written during a research 

eave supported by the Graduate School Re- 

Search Committee of the University of Wis- 

oe from funds provided by the Wisconsin 
lumni Research Foundation. 


ANALYSIS OF SINGLE TRENDS 


The first procedure, which will be 
briefly presented, has been described 
in Fisher and Yates (6) and Pearson 
and Hartley (8) and in other readily 
available sources. 

As an example, consider a selected 
set of data from the experiment of 
Grant and Schiller on the generaliza- 
tion of the conditioned GSR to visual 
stimuli (7). In this experiment seven 
different groups of subjects were all 
conditioned to give GSR’s to a 12- 
inch visual stimulus. The seven 
groups were then extinguished on 9- 
inch, 10-inch, 11-inch, 12-inch, 13- 
inch, 14-inch, and 15-inch visual 
stimuli, respectively. The average 
magnitude of GSR on the initial 
extinction trial, using the log-conduct- 
ance transformation for 14 subjects 
in each group, is given in the broken | 
curve of Figure 1. 

It was expected that these data 
might show two general tendencies. 
First, although experimental steps 
were taken to prevent it, it was ex- 
pected that there would be a tend- 
ency for the longer stimuli to pro- | 
duce larger GSR’s. In other words, 
the means would tend to drift up- 
ward from left to right in Figure 1 
as indicated by the line labeled 
“linear component.” Secondly, it 
was expected that the data would | 
show a generalization function which 
might be a symmetric, decreasing 
function about the 12-inch test stimu- 
lus as indicated by the curve marked 
“quadratic component.” The ob- 
tained data seem to bea composite of 
both of these tendencies plus some 
random variation. It is obvious, 
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under these circumstances, that the 
between groups F is singularly unre- 
vealing. The means for the generali- 
zation test data appear in the seond 
column of Table 1, and the analysis 


The between 
groups mean square is statistically 
significant as shown by the F of 2.23, 
but the conclusion permitted by this 
f test, namely, that the group means 
differ significantly, has little bearing 
on the phenomena with which the 
| experiment deals. 
| Far better tests of the specific 
| hypotheses in which we are interested 
| can be obtained from rather elemen- 
| tary mathematical and statistical 
considerations. In general if y, the 
dependent variable, is a function of 
x, the independent variable, then y 
may be expressed in the form of a 
power series. 


y=n+tar+t arar -o [1] 


Independent statistical tests of the 
linear, quadratic, cubic, and higher 
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Fic. 1. FIRST GENERALIZATION T 
MENT; TRANSFORMED GSR PLOTTED AGAINST V. 


EST TRIAL DATA FROM GRANT-SCHILLER EXPERI- 


ALUE OF THE TEst STIMULI 


components of a curve can be ob- 
tained because the above formula 
may be written in the form: 


Y= Aot Api (x) + A2) 
+Asds(x)+ o, [2] 


where ¢;(x) is the orthogonal poly- 
nomial of ith degree. Under the con- 
ditions of normal, homogeneous, ran- 
dom variations about the data points, 
and equal intervals between levels 
of the independent variable, A; can 
be independently estimated and 
tested so that independent tests are 
available for the existence of sig- 
nificant linear, quadratic, cubic, etc.» 
components of the trend. The proce- 
dure is standard (6, pp. 27-29; 8: 
pp. 91-95). : 

In the Grant-Schiller experiment, 
the hypothesis that increasing Si 
of stimulus is related to increasing 
GSR is best investigated by testing 
the significance of the linear geo 
ponent of the trend, and the hypoth- 
esis that there is a generalization 
gradient decreasing symmetrically 


= Ye "e 


a ai 
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from the 12-inch test stimulus is 
best investigated by testing the signifi- 
cance of the quadratic component of 
the trend. The procedure for making 
the test is simplicity itself. First of 
all, the group totals are written down 
in order, as in column three of Table 1. 
There are seven data points for 6 de- 
grees of freedom between group 
totals, so that the contribution of 
each of six orthogonal components 
can be tested. The values of the six 
orthogonal polynomials for each of 
the seven data points are obtained 
from Pearson and Hartley's table 
(8, p. 212). These values are given 
in the right-hand upper portion of 
Table 1, under the columns headed 
Gir, Par, ++ +, box For example, the 
values of the fourth polynomial, 
ou, are 3, —7, 1, 6, 1, —7, and 3, 


coeficients have been written in 
column headed I. Sums of cross- 
products are then computed for each 
of the orthogonal polynomials. These 
are obtained by summing the prod- 
ucts of the polynomial times the 
respective group totals. The sums 
of cross-products are labeled P;, and 
they appear at the foot of the upper 
part of Table 1. Under column 
headed I is the sum of all of the group 
totals or the grand total GT. To ob- 
tain P,, the sum of cross-products for 
the 1st degree polynomial, we have: 
21.34 (—3) -+37.01(—2) +59.87(—1) 
+ 54.09 (0) + 54.10 (1) + 39.03 (2) 
+51.86(3) =89.83. Similarly, under 
the column $4 we have 21.34(3) 
+ 37.01 (— 7) + 59.87 (1) + 54.09 (6) 
+ 54.10 (1) + 39.03 (— 7) + 51.86 (3) 
=125.83. Also given in Table 1 is 


Tespectively. In addition, the unit Dx ¢n2 for each of the six sets of 
TABLE 1 
ANALYSIS OF GENERALIZATION CURVE FROM GRANT-SCHILLER DATA 
Group] Me = Ty I ou got bee ou: uz sk 
——_" 
9 | 1.52 21.34 1 —3 5 =í 3 -1 i 
10 2.64 37.01 1 —2 0 1 -7 4 ae 
11 | 4.28 59.87 1 -1 -3 1 1 -5 E 
12 |3.86 54.09 1 0 =4 0 6 a 
13 | 3.86 54.10 1 1 = el 1 a SE 
14 2.79 39.03 1 2 0 -1 a $ $ 
1S | 3.70 51.86 1 3 5 1 
7 = -71 
Py=ZbuTe 317.30 89.83 —192.27 34.27 125.83 —6.41 244 
7 924 
Erp? 28 84 6 154 84 
Summary of Analysis of Variance 
F 
Source of Variation df Sum of Squares Mean Square 
gee Oe | 
A. Error _ 
.8406 
g, Mithiin Groups) aa ae se ants 2.2261" 
A ri SEND 1 20.5853 20.5853 3.52, 
i 31.4352 4352 3 
E : Quadratic i 13.9813 13.9813 2.39 
F. Quarti 1 7.3438 7.3438 re 
G Quintic 1 0.0349 0.0349 0.01 
H. Sextic 1 4.6292 4.6292 ; 


Significant at the 5% confidence level. 
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polynomial coefficients. Thus, for 
the first orthogonal polynomial 


De bu? =(—3)?+(—2)*4+-(— 12+ 0)? 
+(2)?+(3)?=28. The sum of squares 
for each orthogonal component is 
then simply: SSp,=P?/n Dr p? 
where n is the number of scores 
making up each group total. More 
generally, this formula can be writ- 
ten: 


SSp;= ( = ibn) / (X $a’), [3] 


where the subscript, j, refers to the 
order of the polynomial and the sub- 
script, k, refers to the level of the 
independent variable. 

For the linear component, then, 
the sum of squares will equal 
(89.83)?/(14)(28) or 20.5853; and for 
the 4th polynomial the sum of squares 
will be (125.83)?/(14) (154) or 7.3438. 
The sums of squares are entered in 
the summary of the analysis of vari- 
ance table. A good check on the 
computations is that the sum of 
squares for all of the orthogonal com- 
ponents must equal the SShetween Bien 
In this particular instance, only the 
linear and quadratic sums of squares 
would be computed to test specific 
hypotheses, and the additional com- 
ponent sums of squares need be com- 
puted only to check the accuracy of 
computations. 

When the mean squares resulting 
from these components are tested 
against the error variance, it is found 
that only the quadratic component is 
statistically significant, although the 
linear component approaches signifi- 
cance. It is therefore possible to 
conclude that there is a significant 

quadratic trend in the data of Fig- 
ure 1, or that there is a statistically 
significant generalization effect. The 
other orthogonal components have 
been tested in Table 1, but, for 
reasons given by Duncan (4), it is 


GRANT 


not a particularly wise procedure to 
test components for which there are 
not specific a priori hypotheses. 

The above analysis can generally 
be used when a different group of 
subjects is run under each of the 
levels of the independent variable. 
When the same subjects have been 
used in repeated tests under different 
values of the independent variable, 
it is necessary to subdivide the 
SSerror into orthogonal components 
in order to obtain appropriate error 
terms for testing each of the com- 
ponents of the over-all trend. The 
procedure for subdividing the inter- 
action will be outlined as a special 
case of the general procedure de- 
scribed below for repeated measure- 
ments. 

A word of caution should be in- 
serted here with respect to logarith- 
mic, exponential, and trigonometric 
functions. If there are theoretical 
bases for expecting such transcen- 
dental components in the data, and 
the data are highly reliable, all of 
the orthogonal polynomial compo- 
nents may be statistically significant, 
and a more definitive test may be 
obtained by fitting the appropriate 
parameters. This type of test 1S 
beyond the scope of the present 
paper. 


ORTHOGONAL POLYNOMIAL ANALYSIS 
OF TRENDS BAsED uron RE- 
PEATED MEASUREMENTS OF 
THE SAME Ss 
Alexander (1) has presented @ 
very useful technique for the analys!§ 
of trends based upon repeated meas- 
urements on the same individuals. 
The present paper extends the pro- 
cedures outlined by Alexander by 
further analysis of the orthogon@ 
components of the trend and bY 
providing for the separation of orthog- 
onal components of differences b& 
tween groups which may be permitte 


ch 
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by i the logic of the experimental 
design. The notation used is adapted 
from Alexander (1). 

The general functional plan of the 
analysis is as follows: We have a 
total of n subjects in p groups, and 
each subject has contributed a score 
on each of k trials or stages of the 
experiment, so that there will be nk 
scores in all. These give the ‘“‘indi- 
vidual trends.” Each group will then 
have a mean on each trial so that 
there are pk group by trial means 
which give the “group trends.” There 
will also be a combined group mean 
on each of the k trials which gives the 
_ Over-all trend” of the scores. Ignor- 
ing the scaled or sequential character 
of the trials variable, the total sum of 
squares could be analyzed into: 
SStrints with k-1 df, SSgroups with 
P—1 df; SSinaiviauats With n= df; 
AS rupeaa with (k-1)(p-1) df; 
and SSinaividuatsxtrials With (k— 1) (7 — 
p?) df. But trials is a scaled variable, 


and the Alexander (1) scheme of' 


trend analysis subdivides SStrials in- 
to SSover-ait slope With 1 df and 
Seyeral aeytationn.sfeom. Tasti With. E 
—2 df; similarly SSgroupsx trials İs brok- 
en down into SSbetween group slopes with 
pl df and SSgroup deviations from linearity 
with (p—1)(k—2) df—these deal with 
the differences between groups in 
linear components and nonlinear 
components of trend, respectively; 
and the SSinaividuatsxtriels is broken 
down into SSpotween individual slopes with 
n—p df and SSindividual deviations from 
estimation With (n—p)(k—2) df which 
Consists of the differences between 
individuals in linear components and 
Nonlinear components of trend, re- 
spectively. Our method goes On to 
Separate the SSover-all deviations from 
linearity, SSgroup ‘deviations from linearity: 
and SSindividual deviations from estimation 
into the quadratic, cubic, quartic, 
etc., components of the over-all trend, 


the differences between group trends, 


and the differences between individual 
trends, respectively, so that we have 
between group quadratics, between 
group cubics, etc., and between indi- 
vidual quadratics, between individual 
cubics, etc. 

If the p groups form an orthogonal 
design, e.g., variables A and B with 
a—1 and b—1 df, respectively and an 
AB interaction with (e—1)(b—1) df, 
then each sum of squares between 
groups (means, slopes, quadratics, 
cubics, etc.) can be separated into 
an A, a B, and an AB interaction 
component. This extension of Alex- 
ander’s procedure is frequently help- 
ful, and it too has been included in 
the example below. 

As an example, selected data are 
presented in Table 2 and Figure 2 
from an unpublished experiment by 
Grant, Kuboyama, and Patel on the 
influence of electric shock stimulation 
on the conceptual behavior of “anx- 
ious” and “nonanxious” Ss. In Table 
2, on the left-hand side appear the 
perseverative error scores on the 
second, third, fourth, fifth, and sixth 
stages of the Wisconsin Card Sorting 
Test (WCST) as subjected to the 
square-root transformation (2). The 
average values of the transformed 
scores have been plotted for each 
group on the successive stages in 
Figure 2. In this experiment high 
and low anxiety groups were selected 
on the basis on the Taylor Anxiety í 
Scale (10) and were subdivided in- 
to three subgroups each, so that the 
subgroups could receive 0, 2, and 12 
electric shocks during the course of 
the experiment. The Ss receiving 
two shocks received them at the first 
stage of the WCST and the Ss re- 
ceiving 12 shocks received two shocks 
per stage on each of the six stages © 
the WCST. High-anxiety groups 
were designated Ho, He, and Hu, 
respectively, and the low anxiety 
groups were designated Lo, Lz, and 
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Ly, respectively. Data from selected 

sets of four of the 17 Ss in each origi- 

nal experimental group appear in Ta- 

ble 2. 

From Figure 2 it is apparent that, 

as expected, two of the high-anxiety 
groups showed progressive deteriora- 
tion in performance during the course 
of the WSCT. In contrast, the low- 
anxiety groups showed progressive 
improvement throughout the experi- 
ment as did also the Hz group. Over- 
all tests of the differences between 
groups and differences between stages 
are not particularly revealing with 
respect to the trends obtained. What 
is needed is a test which will show 
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whether or not the improvement 19 
the low-anxiety groups as contraste 
with the failure of the high-anxiety 
groups to improve is statistically 
significant. If the low-anxiety group” 
improve and the high-anxiety group 
deteriorate, this should result a 
statistically significant differences í 
the linear components of the men pn 
The computational procedure 10 
the orthogonal polynomial tre? 
analysis is essentially simple an" 
straightforward, but there are re 
quent opportunities for error. i 
number of excellent checking pre 
cedures are available, but as with be 
calculations, extreme care must 
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TABLE 2 


> SELECTED TRANSFORMED PERSEVERATIVE ERRORS AND SUMS OF 
Cross-PRODUCTS FOR TREND ANALYSIS 


Stage 
Sub- gi 
' Group ject 2 3 5 F I Py Pz P: Pe 
3 1.7 1 17 1 1 ga =i4 “olor — 07 4.9 
L, | 35 3 ae 1 1 7I =A 33 — 0.6 — 0.8 
17 2A ba 3 24 0 9a = $8 — 50 = 28 5.2 
21 14 1 1 1 0 the — 28 = 12 — th — 0S 
Tot. Ta | 8.5 5.1 6.7 s4 2.0] 27 7 -07 — 29 = 71 8.7 
5 2 24 i 0 0 5.4 — 6.4 — 0.4 2.8 = 1:6 
16 28 í 1 1 1 68 —3.6 36 —1.8 1.8 
Ty 
» lata | 5.3 2 0 0 1 $3 —10.6 10.6 —0.3 — 1.7 
18 2 li th ie 18/808 -33 = 1.5 4.8 —10.2 
= Tot. (Ta | 12.1 9.5 3-4 2-4 3.4] 30.8 —24.5 12.3 55 i7 
7 22 17 1.27 t4 í gö = 27 = OL — QO 1.0 
Le | 22 4.4 0 nm 0 71 = Fe ad =64 | 108 
* |22 2 ay igp l 1 |t01 =5 —2.5 6a 114 
27 2.8 1.7 2 0 0 65 = hs — Oz 0.6 8.0 
Tot. |Ty | 11.4 8.1 6.8 3.4 2.0 31.7 —23.5 1.7 0.0 & 82 
24 1.4 1 0 1.4 1 4.8 — 0.4 ia = Te E Te 
H wA | 1.7 2.4 2 3.7 2.2 | 12.0 28 = ga Hodes 
5 43 7 Ad LI 4 3.2 | 14.7 2.9 —1.7 1.7 -17.3 
44 2.8 1 2., 3.7 3.2 | 13.1 3.5 2.5 — 5.0 1.6 
f Tot. | Ta 76 8.5 6.1 12.8 9.6 | 44.6 33. 69 =G SoM 
12 2 1.7 24 1 tal 65 —- 29 = 0.2 08 7.0 
m | 25 17 Le 0 0 0 3.1 — 4.8 2.0 ti - oe 
32 14 0 1 0 0 2.4 — 2.8 os — 1-4 ae 
48 1.4 1.4 J 1 0 so 2 32 Sto 0.6 — 2. 
a ee fee 
Tot. Ta | 6.5 4.5 44 2.0 1-4 ma ar 08 ON 
T a aa 1 Z arl E Aa 
; 25 TI fa L ġa 28| Se 1.8 2 a S 
Hu | 39 1 2 i4 T dah gg ar = 1.0 Me -ig 
40 4.2 1.7 2.2 4-9 4.2 | 18.7 4.2 6.8 — 5. : 
= = 9 
Tot. [Te | 8.0 7.1 6.6 9-7 Solas Ga Se guð i 
| T |r, |54.1 42.8 34-0 35.7 28.3 (194.9 _5g.7 18.3 —11.6 976 
exerci i ; _ number of subjects in the ith group 
xercised to insure accuracy: _ Sub- 1 a a RSA E 
scripts will be used as follows:+ on is i ge are eae : 
to groups, j refers to the order of the total number of gr js 
polynomial, k refers to the trial num- The first step is to enter all of the 
; original scores, by groups, into a 


Wai ber and Z refers to individuals. The 


total number of subjects is n; the as Table 2. In the left- 


table such 


n 
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hand part of the table the scores of 
each subject are entered in a separate 
row, and space is kept for group 
totals. The marginal totals are then 
computed, the row marginals in the 
column headed I, and the column 
marginals for each group headed in 
the rows labeled Ty, Tx, etc. Thus, 
for the first row [=1.7+1+1.7+1 
+1=6.4. In the first column of the 
first group the Ty-=Tu=1.7+3+42.4 
+1.4=8.5. For each group, the sum 
of the I’s and the sum of the T;’s 
must be equal, which serves as a 
check on the computation of the 
marginal totals. At the foot of the 
table, the totals for each stage, T+, 
54.1, 42.8, ---, 28.3, are computed 
and these add up to the grand total, 
GT, of 194.9 which must equal the 
sum of the group subtotals; i.e., 
27.7 +30.8 +31.7 +44.6 +18.8 +41.3 
= 194.9. 

The next step is to compute and 
check the P’s. First the table of the 
orthogonal polynomials, $j, for the 
number of trials or stages is found in 
Fisher and Yates (6) or Pearson and 
Hartley (8), and these are given in 
Table 3 for the present example. To 
obtain the P’s for each subject, the 
following formula is used: P; 
=D. Yip For example, for the 
first subject Pi=1.7(—2)+1(—1) 
+1.7(0)+1(1)+1(22)=~14, Da 
=1.7(2)+1(—1)+1.7(—2) +1(—1) 
+ 1(2) = 0.0, Ps = 1.7 ( — 1) +1(2) 
+1.7(0)-+1(—2)+1(1)=—0.7, and 
Py=1.7(1) +1(—4) +1.7(6) +1(—4) 
+1(1)=4.9. The best procedure is 
to compute all the P,’s and then all 
the P2’s, etc. As a check for each 
group, the sum of the individual P25 
for that group should be equal to 

the sum of the products of the stage 
totals, Tiz, for the group, times the 
values of the orthogonal polynomials 
for each stage; i.e., for the jth poly- 
nomial, Dox Tubx= 2 Py. For 
example, the Py’s for the first group, 
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TABLE 3 


VALUES OF THE ORTHOGONAL POLYNOMIALS 
FOR ANALYSIS OF THE PERSEVERA- 
TIVE ERROR TRENDS 


| Stage 

ik Debit? 
| 2 3 #£ § 6 

gz |-2 -1 0 1 2 10 

xx 2 =| =% =1 2 14 

ge |-1 2 0 -2 1 10 

otk 1-4 6 -4 1 | 70 


—14 —4.7 —3.8 and —2.8 add up 
to —12.7. Also 8.5(—2)+5-1(—1) 
+ 6.7(0) +5.4(1) +2.0(2) = — 12.7; 


which serves as a check on the ac- ' 


curacy of the P,’s for the first group- 
A similar check must be made with 
each polynomial in each group and 
with the over-all stage totals at the 
foot of Table 2. 

The next step is to obtain the 
quantities specified by the equations 
in Table 4. Table 4 consists of three 
columns, U, V, and W which dea 
with the trials by individuals meas 
ures, trials by groups measures, 21 
over-all trials measures, respectively- 
Associated with each entry in Table 
4 is the corresponding df. The com- 
putations indicated in Table 4 have 
been carried out in Table 5 so that 
they can be identified from the cor- 
responding entries in Table 2. If the 
work has been done correctly, the first 
entry in each column is the sum Q 
the remaining entries, and this 15 
an important check. 

The only step remaining before the 
final computations of the sums ° 
squares for the summary of the analy- 
sis of variance is to obtain subtota $ 
of the groups on I, Pa, Pa, Ps, and Pa 
for the three levels of shock, 0, 2 47 
12, and the two levels of anxiety: 1 
and H. This has been done in Tabte 
6 where the entries are simply ah 
totals from the right-hand sides ° 
Table 2. 
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F TABLE 4 
OMPUTATION OF THE 
ION OF THE SUMS OF SQUARES OF THE ORTHOGONAL COMPONENTS OF TREND 
i U Y WwW 
Trials by Individuals Trials by Groups Trials 
y i Ti? ab 2 
xy = ik Mi n x x Ta) 
df=nK df=pK df=K 
i 1 ( x Ta) GT 
z > Iè pas < cr 
T i Kni nK 
df=n df=p df=1 
P, 2 a2 F 
p ze Ze (Fa 
x Qu? md gue n x ous 
df=n df=p df=1 
4 2 ll È Pi? 2 Pu) 
2 oa? md u? nd out 
k 
df=n dj=p df=1 
F Ps? >> Pi? ( 2 Pa) 
P; : : - 
x ba mid, ga? nE out 
7 m k 
df=n df=p ee 
[ees Ar 
E Pat DEL ( 2 Pa) 
P, $ + — : 
E bu? nD bu? ice a 
k 3 ; 
df=n df=p gins 


oe final step remains of entering 
the sums of squares into the analysis 
a variance summary table as has 

een done in Table 7. All the sums of 
Squares in Table 7, except those hav- 
ing to do with the subdivision of com- 
oe of the sums of squares re- 
lated to anxiety, shock, and their 
interation, can be computed directly 


from the sums of squares entries in 
Table 5 as indicated in the column 
of Table 7 headed “Computation.” 
The number of df for each row of 
Table 7 is obtained by applying the 
“Computation” formulae to the df 
entries in Table 5. Then the sub- 
division of the components of varia- 
tion between group means and - be- 
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TABLE 5 
COMPUTED VALUES OF THE TABLE 4 COMPONENTS OF THE TREND 
ANALYSIS OF PERSEVERATIVE ERRORS 
U: TrialsX Individuals V: TrialsX Groups W: Trials 
1 
Y | Uy=1.74+124 +--+ Vrs (8545.14 -+ - w=ž (54.12-++-42.8%-+ +++ 
+4,9+-4.7? = 482.5300 +9.7°+-9.9") = 390.0825 +28.3?) = 332.9179 
120 df | 30 df i 5 df 
1 | 1 2 
I Ur=— (6.447.724 --- Vi=— (27.77+-30.8+4--- | W= (1949) =316.5501 
5 20 20 
+17.7?) =376.0140 +41.3?) =338.4555 
24 df 6 df ldf 
1 1 = 2 
Pi | Urm gg LAAD | Vogg 127249 Wr E 43570 
+-+.. +(4.2)3) +- +(6.4)3] 
=47.3850 =39.6232 
24 df 6df laf 
1 1 
P: | Urs= 77 O.0P+6.3)+ +++ | Ve (294023 | We, = — 
+-(6.8)"]= 19.9521 +++ 4(5.8)3] 
=3.5230 
akd 6 df 1df 
See | ee _ ee 
1 
=—[(- 1 — ” 
Ps | Une igl-O.+(-06)" | VeeS Wwe E 0.5607 
Fas + HESON + (3.3)? 
= 23,9300 =3.3780 
24 df 6af La 
eae nee 
1 
ss 1 hat 
Pe eagles | Vem aI —11.7 | We Aoa 
Re st +(—4.3)3] +++ +(-9.7)3] 
= 15.2488 =5.1027 
24 df 6df 1af 
Sc em n 


tween group trends can be obtained 


Thus, for example, the sum of sq 


uares 


from the entries in Table 6 by the 
usual analysis of variance methods; 
ie, SSg=( 20T<?/Kny)—C, where 
the subscript, q, represents the sub- 
total for anxiety of shock level. 


between group means for anxiety = 
equal to 1/60(90.22-++ 104.72) = Ws 0 
1.7521. The sum of squares ere a 
group means for shock is equa W 
1/40 (72.32 + 49,62 + 73.07) — 
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1/120[( — 60.7% + (2.0)7] — 14.3570 
= 16.3804. The sum of squares for 
shock is 1/80[(—4.4)?+(—37.2)? 


i 
f TABLE 6 


SUBTOTALS OF SUMS AND Cross-PRODUCTS FOR 
COMPUTATION OF “SHOCK” AND “ANXIETY” 


Sums OF SQUARES 


+(—17.1)?]— 14.3570 = 6.8381. The 
sum of squares for the linear com- 


Shock . . . 
Meas- i ponent of interaction will be 25.2662 
F 5 2 — 16.3804 —6.8381=2.0477. Simi- 
E lar quantities are computed for the 
I Io = 723 I = 49.6 Iz = 73.0 quadratic, cubic, and quartic com- 


7 
. 


P, | Pio=— 4.4 Pr2=—37.2 Piun=—17.1 
Ps Poo=— 2.0 Prr= 12.8 Pana 7.5 
2 | Pso=—13.7 P= 5.4 Praw=— 33 
Pe | o=—22.7 Pir=— 34 Pan=— 15 


ponents of the trend as indicated in 
the Computation column (of sums 
of squares) in Table 7. The general 
computing formula is: 


Anxiety 2 
Measure 7 i ( eF Psi) 
L H i 
Sa Ss mae Wo 
I I = 90.2 In = 1047 - PR a Ng 
3 Pir=-60.7 Pin= 20 
P; gpa er nl igen where the subscript, s, refers to a 
L= 1. H=- E , ‘ 
P, P= 52 P=- 328 subtotal on Px; for a single level of 
shock or anxiety. 


=8.8611, and the interaction be- 
tween group means can be obtained 
by subtracting these two sums of 
squares from 21.9054, line B. in Table 
7, which gives 11.2922. Similarly, 
the sum of squares for the linear 
component of the difference in trends 
due to anxiety is: 1/120(P1.1°+Pi.n’*) 
—Wp,. The 120 is 12, the number 
of subjects per anxiety level, times 
10, the sum of the squared poly- 
nomial coefficients. The sum of 
squares for the linear component 
due to shock is 1/80(P1.0°+ P1.2? 
+Pi.2?)—Wr,, where the 80 is 8, 
the number of subjects per shock 
level, times 10, the sum of thesquared 
polynomial coefficients. The sum of 
Squares for the linear component of 
the interaction is the sum of squares 
of the linear component between 
group trends, line C.1 in Table 7, 
minus the sum of squares for the 
linear component due to anxiety and 
the sum of squares for the linear 
component due to shock. The actual 


Í numerical values for anxiety will be 


The mean squares of Table 7 are 
computed in the usual way by divid- 
ing the sums of squares by the ap- 
propriate degrees of freedom. It 
remains to determine the appropriate 
error variances for the F tests. In 
general, these are the between-indi- 
viduals mean squares, and the row 
entry for each error term has been 
listed in the next to last column of 
Table 7. The between-group-means 
mean squares should be tested against 
the between-individual-means error 
term. The between-group-trends- 
linear terms and the over-all-trends- 
linear term should be tested against 
the between-individuals-linear mean 
square; and, in general, the quadratic 
terms should be tested against the 
between-individuals-quadratic mean 
square, the cubic terms should be 
tested against the between-individ- 
ual-cubic mean square, and the quar- 
tic terms should be tested against the 
between-individuals-quartic mean 
square. Appropriate tests for over- 
all trend, line A of Table 7, and be- 
tween-group trends, line C of Table 
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7, are a little difficult to justify, but 
probably the best error term for these 
would be the between-individual- 
trends mean square, line E of Table 7. 
These F tests have been made and 
are entered in the last column of 
Table 7. 

The significant F’s of Table 7 may 
be briefly‘interpreted. The significant 
Over-all trend means simply that the 
Over-all average scores vary from 
Stage to stage during the test. Since 
only the linear component of the 
Over-all trend is statistically signifi- 
cant, it may be concluded that the 
Over-all trend is essentially linear. 
Higher order components of the over- 
all trend fail to attain significance. 
There are no significant differences 
between group means due to anxiety, 
shock, or the interaction of anxiety 
and shock. The general between- 
group-trends F is not significant. The 
linear components of{the differences 
1n group trends are, however, highly 
Significant, and in particular, the 
anxious groups tend to have a less 
negative slope than the nonanxious 
groups. There are also significant 
differences between the different 
shock groups in the linear component. 
None of the higher order components 
of the differences between group 
trends was significant. (This was 
anticipated, but the tests were made 
to illustrate procedures.) There are 
Significant individual differences in 
average performance as shown by 
the significant F between individual 
Means, but this will usually be found 
in the case of reliable measures of 
Performance. 

The procedures outlined above can 
Teadily be applied to longer series of 
trials and can be extended to more 
complicated experimental designs 
with higher-order interactions, al- 
though in many: instances interpre- 
tations will become very obscure. 
The number of subjects may vary 
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from group to group, but if an orthog- 
onal design is used the numbers of 
subjects in rows and columns must 
be proportional (9, pp. 281-284). 
One other case which should be 
described briefly is that in which 
there is a single curve made up of 
average values from repeated meas- 
urement on a number of individuals. 
This would be the case if the Grant- 
Schiller data reported earlier had 
been obtained on a single group of 
subjects tested at all the different 
generalization test stimulus values. 
(Then each subject would have con- 
tributed a score for each generaliza- 
tion test stimulus.) In the case of 
repeated measurements the over-all 
trend can be analyzed as described 
above in Table 7. There will be no 
between group differences but between 
individual means and between indi- 
vidual trends measures can be ob- 
tained and separated into the orthog- 
onal components. In this case the 
V column of Table 4 will not be com- 
puted, and the between-individual- 
means sum of squares will be Ur — Wr, 
and the corresponding linear, quad- 
ratic, cubic, etc., terms will consist 
of the quantities from the W column 
subtracted from the corresponding 
quantities in the U column instead of 
the V’s from the U’s as in the present 
Table 7. The error terms will cor- 
respond to those used in the analysis 


in Table 7. 

The above procedures do not con- 
stitute a universally applicable rou- 
tine method of analyzing and com- 
paring trends. No such method can 
exist, and no routine can substitute 
for an experimenter’s insight and in- 
genuity. We have found the pro- 
cedures extremely useful, however, in 
comparing curves with respect to 
slopes, curvatures, sharpness of in- 
flections, etc., SO that we can recom- 
mend them highly for testing specific 
hypotheses relating to trends. 
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METHODS AND TERMINOLOGY IN STUDIES OF 
TIME ESTIMATION! 


DALBIR BINDRA anp HELENE WAKSBERG 
McGill University 


In experiments on time estimation, 
S is required to make judgments 
about temporal durations. The dura- 
tion that S is asked to judge is the 
Standard; the estimate of the stand- 
ard made by S is called the judgment. 

he studies in the area of time esti- 
mation are concerned primarily with 
the relative magnitudes of, and the 
general relations between, the stand- 
ard and the corresponding judgment. 

espite this relatively simple design 
of time estimation studies, they are 
quite confusing to read, and an at- 
tempt to make generalizations from 
the results of different studies gets 
One involved in many apparent or 
real contradictions. These troubles 
Seem to arise from the fact that dif- 
ferent investigators not only use dif- 
ferent methods in their experiments, 
but also employ different sets of 
terminologies to describe their re- 
sults. Thus, one investigator uses the 
Method of reproduction in the study 
of a problem while another employs 
the method of verbal estimation for 
the same problem. Similarly, one 
researcher chooses to describe his 
Tesults in terms of over- or under- 
ation of time, while another pre- 
ers to talk in terms of relative speed 
of internal and external clocks. This 
Makes it difficult to compare studies, 
unless the exact relations between 
a different methods, and the dif- 
erent expressions used for describing 


* This study was supported by a research 
grant (A.P.12) from the National Research 
por of Canada. It is based in part on an 
ae thesis, “Serial-position gradient in time 
ee " submitted to McGill University, 


results, have been clarified. Some 
investigators, for example Eson and 
Kafka (2) and Clausen (1), have 
pointed out the existence of this diffi- 
culty, but have dealt with it only in 
a limjted way, within the framework 
of their particular problems. Even 
the standard textbooks of experimen- 
tal psychology (for example, 5) have 
not systematically examined the 
problem of the equivalence or lack of 
equivalence of the different methods 
and terms. This is the task of the 
present note. 


METHODS 


Three main methods are commonly 
used in time estimation experiments. 
In the method of verbal estimation, E 
delimits a given interval operatively 
(i.e., demonstrates the duration of 
the standard), and S is asked to esti- 
mate verbally its duration (the judg- 
ment) in terms of seconds or minutes. 
In the method of production, S is 
instructed to delimit operatively an 
interval (the judgment) of a given 
duration (the standard) stated ver- 
bally by E. In the method of repro- 
duction, E operatively delimits an 
interval (the standard) and then asks 
S to reproduce operatively an inter- 
val (the judgment) of the same dura- 
(A variation of the method of 


tion. 

reproduction is the method of com- 

parison. In this method £ presents 
cutively and S is 


two intervals conse! e ] 
asked to judge their relative duration 


by saying which one is the longer. It 
resembles the method of reproduc- 
tion in that both involve an operative 
presentation of the standard and a 
judgment that also refers to an opera- 
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tively delimited duration.) Thus we 
see that both the standard and the 
judgment may be defined in either of 
two different ways, by operative 
demonstration or by verbal state- 
ment. The term elapsed time, used by 
some authors, is used synonymously 
either with the standard or with the 
judgment, and does not require sepa- 
rate discussion here. Table 1 shows 
how the three methods differ with 
respect to the defining operations of 
the terms “standard,” “judgment,” 
and ‘elapsed time.” 

The three methods also differ with 
respect to another variable. These 
differences are also shown in Table 1. 
This variable is objective vs. subjective 
definition of the terms “standard,” 
and “judgment.” (Elapsed time is 
always defined in terms of objective 
time.) The standard is sometimes de- 
fined in terms of objective or clock 
time; at other times it is defined with 
reference to subjective or personal 
time. The judgment may also be de- 
fined in either one of these two ways. 
Thus, logically, we can combine 
standard and judgment in four dif- 
ferent ways: (a) standard defined in 
terms of objective time and judg- 
ment defined in terms of subjective 
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time; (b) standard defined in terms 
of subjective time and judgment de- 
fined in terms of objective time; (©) 
standard defined in terms of objective 
time and judgment defined in terms 
of objective time; and (d) standard 
defined in terms of subjective time 
and judgment defined in terms of 
subjective time. The last of these 
combinations is empirically meaning- 
less. We have only three meaningful 
combinations and, correspondingly, 
three main methods of time estima- 
tion (Table 1). Thus we see that the 
different methods used in time esti- 
mation experiments represent the 
logical combinations of the two ways 
(in terms of objective vs. subjective 
time) of defining standard and judg- 
ment. 


Description of Results 


The results of time etme 
experiments have been described bs 
a number of different ways. Oar 
or underestimation of the standar 
(5), “over- or underestimation a 
elapsed time” (2, 3), ‘the relative 
speed of the ‘internal’ and ‘ee 
clocks,” and “the relative magnitu x 
of the subjective and objective tein 
poral units” (4) are alternative © 


SIGNIFICANCE oF THE TERMS “STANDARD,” à 
IN VARIOUS Metuops or 


TABLE 1 


” 
‘JUDGMENT,” AND “ELAPSED TIME 
TIME ESTIMATION 


Methods 
Terms 
Verbal Estimation 


Production 


Reproduction 


Interval stated verbally 
by E; refers to subjective 


Operative estimate made 
by S; refers to objective 


Refers to the objective 


Standard Interval delimited opera- 
tively by E; refers to ob- 
jective (clock) time, time. 
Judgment Verbal estimate made by 
S; refers to subjective 
time. time. 
Elapsed Refers to the objective 
ats duration of the standard 


(operatively defined), 


duration of S's judgment 
(operatively defined). 


i a- 
Interval delimited opa i 
tively by E; refers to 
jective time. 


, de 
Operative estimate, sve 
by S; refers to obje 
time. 


jective 
Refers to the objecta j 
duration of the stam 


j nt 
as well as of S's judgm® 


(operatively defined). 


Ie 


STUDIES OF TIME ESTIMATION 


pressions that have been used by dif- 
ferent investigators. Thus, a judg- 
ment larger than the standard ob- 
tained with the method of verbal 
estimation may be described as de- 
noting overestimation of the stand- 
ard, or overestimation of elapsed 
time, while a similar result obtained 
with the method of production may 
be described as denoting overestima- 
tion of the standard, or underestima- 
tion of elapsed time. Some investi- 
gators present their results only in 
terms of subjective and objective 
clocks or temporal units. The exact 
relations between all these expres- 
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sions have not been adequately clari- 
fied. Table 2 attempts to show for the 
different methods the significance of 
these various expressions with respect 
to the relative magnitudes of the 
judgment and the standard. 

Elapsed time refers to temporal 
durations as measured by standard 
clocks (objective or ‘‘external’’ 
clocks). ‘‘Internal clock” refers to a 
hypothethical mechanism in S, which 
is immediately and directly related 
to performance in a time estimation 
task. The rate of this clock may vary 
from individual to individual and in 
the same individual from time to 


TABLE 2 


SIGNIFICANCE OF THE RELATIONS BETWEEN THE JUDGMENT AND THE STANDARD IN VARIOUS 
METHODS or TME ESTIMATION. THE Four Entries IN Eac Box ARE EQUIVALENT 
STATEMENTS OF THE RELATION BETWEEN THE JUDGMENT AND THE STANDARD 


Relative 
magnitude 


Methods 


of judgment 


and standard Verbal Estimation 


Production Reproduction 


Overestimation of the 


. Overestimation of the 1. 


standard 


. Underestimation of 


elapsed time 


. Internal clock slower 


than external clock 


. Subjective temporal 


units larger than ob- 
jective temporal units 


standard 


. Not applicable 


. Internal clock slower 


than external clock 
during reproduction ` 


. Subjective temporal 


units larger than ob- 
jective temporal units 
during reproduction 


Judgments 1, Overestimation of the 
arger than standard 
the stand- 
ard 2. Overestimation of 
elapsed time 
3. Internal clock faster 
than external clock 
4, Subjective temporal 
units smaller than ob- 
jective temporal units 
Judgments 1. Underestimation of 
smaller than the standard 
the stand- 
ard 2. Underestimation of 


(a) 


elapsed time 


. Internal clock slower 


than external clock 


. Subjective temporal 


units larger than ob- 
jective temporal units 


, Underestimation of 


the standard 


. Overestimation of 


elapsed time 


Internal clock faster 


` than external clock 


_ Subjective temporal 


units smaller than ob- 
jective temporal units 


. Underestimation of 


the standard 


. Not applicable 


Internal clock faster 


than external clock 
during reproduction 


. Subjective temporal 


units smaller than ob- 
jective temporal units 
during reproduction 
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time. Subjective temporal units 
refer to subjective norms of the mag- 
nitude of the duration of seconds and 
minutes, and are presumably ac- 
quired through learning. These sub- 
jective units, like the internal clock, 
may be presumed to vary from indi- 
vidual to individual and in the same 
individual from occasion to occasion 
depending on internal and external 
conditions. It is clear that the con- 
cepts of subjective temporal unit 
and of internal clock are closely re- 
lated.? 

Three examples, one with each 
method, will help to clarify the rela- 
tions shown in Table 2. Consider the 
situation in which £, using the 
method of estimation, presents a 
standard of 15 sec., and obtains from 
S a judgment of 20 sec. The judg- 
ment being larger than the standard, 
we say that S has overestimated the 
standard. Since in the method of 
estimation, the standard and elapsed 
time refer to the same thing (Table 
1), that is, to the objective duration 
of the interval delimited by E, we can 
also say that S$ has overestimated 
elapsed time. The fact that S 
thought more seconds had elapsed 
than actually had, in the given dura- 
tion of the standard, means that his 
subjective temporal units were 
smaller than objective temporal units. 
This is equivalent to saying that S’s 
internal clock ran faster than the ob- 
jective clock. 

Next, consider the situation in 
which £, using the method of pro- 
duction, obtains from S the same 
judgment of 20 sec., when the 
verbally stated standard is again 15 
sec. Since the judgment is larger than 
the standard, we can say that S has 

overestimated the standard. Since, in 


2? The authors are grateful to Dr. Peter 
Milner for clarification of some of these 
relations. 
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the method of production, elapsed 
time refers to the duration of the 
interval delimited by S (Table 1), we 
can also say that he has underesti- 
mated elapsed time. The fact that S 
thought he had produced 15 sec. 
while he actually produced 20 sec. 
(since 15 of S's subjective seconds 
elapsed in the course of 20 objective 
seconds) means also that S's sub- 
jective temporal units were larger 
than objective temporal units. This 
is equivalent to saying that his inter- 
nal clock ran at a slower rate than 
the objective clock. as: 
Finally, consider the situation 1 
which Æ, using the method of repro- 
duction, delimits an interval of 15 sec- 
and requires S to reproduce an inter- 
val of the same duration. If S's Te- 
production is 20 sec., his judgment 
larger than the standard; he has over 
estimated the standard. But since 
in the method of reproduction, 
elapsed time refers both to the ae 
tion delimited by Æ and the duratio” 
delimited by S (Table 1), we pes 
make any meaningful statement 
garding over- or underestimation 
elapsed time. Further, in the methot 
of reproduction, the relation beme 
the magnitude of the judgment Se 
the subjective temporal units pr H 
rate of the internal clock is not In 
simple as in the other methods. ES 
the other methods, over- or a 
estimation necessarily implies t oe 
the S's subjective temporal a dite 
(and the rate of internal clock) is its 
ferent from objective temporal “But 
(and the rate of external clocks). dić 
in the case of the method of repro 8 
tion, whether S's internal clock on 
faster or slower than the rear e 
clock, he may still reproduce 
duration of the standard quite er 
curately, for his subjective eT ol 
units are not likely to change toa 
the time that he is exposed to ro- 
standard to the time that he rep 


ae 
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duces it. However, if, as in our 
example, S does give a judgment 
larger than the standard, it can be 
said that S's subjective temporal 
units were not only larger than ob- 
— units at the time of the repro- 
Se but were also 
ger than his subjective units at the 
time of the presentation of the stand- 
“8 This is equivalent to saying 
iat , doring reproduction, his inter- 
Ane ock ran slower relative, not 
gan to the external clock, but rel- 
a fe so to his own internal clock 
the X time of the presentation of 
a aad, In the method of 
bee oo then, the relation be- 
oe a magnitudes of judgment 
eee andard cannot be as readily 
en described in terms of sub- 
ek temporal units or internal 
k as in the case of the other 

methods. 
te we see that a given difference 
en standard and judgment may 
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signify quite different underlying 
events in the different methods. An 
identical difference may signify faster 
internal clock in the method of verbal 
estimation, slower internal clock in 
the method of production, and slow- 
ing down of the internal clock in the 
method of reproduction. Stating 
results in terms of the relative magni- 
tudes of standard and judgment 
(over- or underestimation of the 
standard) is satisfactory only when 
all data are obtained with the same 
method. When, in the course of re- 
viewing studies and making generali- 
zations, it becomes necessary to com- 
pare results obtained with different 
methods and to theorize about the 
exact mechanisms underlying time 
estimation, it would seem desirable to 
restate the results in terms of the 
relative speeds of internal and exter- 
nal clocks or relative magnitudes of 
subjective and objective temporal 
units. 
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+ A number of recent investigations 
have drawn attention to the part 
played by the relative difficulty of 
two or more tasks in the degree of 
transfer from one task to another. 
Results from these experiments have 
important implications for practical 
issues in training as well as for trans- 
fer theory. Since, however, there are 
certain inconsistencies in defining 
difficulty as well as discrepancies 
among the experimental results this 
area of investigation is in need of 
critical evaluation. The main pur- 
pose of this review, therefore, is to 
bring within the scope of one paper 
evidence from numerous experiments 
which have been directly or indirectly 
concerned with the effect of task dif- 
ficulty on transfer in skilled perform- 
ance and to subject this evidence to 
critical analysis. It should be men- 
tioned in this connection that a num- 
ber of the experiments dealt with here 
have not been primarily concerned 
with task difficulty in relation to 
transfer, but the evidence has arisen 
more or less incidentally during the 
course of the experiment. Also, this 

_ Paper will be limited to a considera- 
tion of recent investigations within 
the field of human skill, although it is 
known that some earlier work (6) has 
drawn attention to the relationship 
existing between task difficulty and 
transfer of training. 

Essentially, a skilled task in its 
simplest form possesses three basic 
features. These are: (a) a stimulus 
complex sometimes called the dis- 
play, (b) devices by means of which 
elements in the display are brought 
under control by the responses of the 
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operator, and (c) a linkage between 
these two. This broad analysis aP- 
plies whether the task is a simple one 
such as writing with a pen upon 
paper, or a very much more complex 
task such as is involved in various 
kinds of tracking behavior. The di $ 
ficulty of a task may vary as 4 func 
tion of changes in one or more © 
these features. The manner in wie 
task difficulty has been varied pias: 
each of these task features will ‘ll 
treated first in this paper. This in 
be followed by a review of results fee 
tained after which certain metho be 
logical and theoretical issues W! 
taken up. á 
For the most part the general pies 
cedure used in experiments of e 
nature involves either training & nU nt 
ber of matched groups under differs? 
conditions of task difficulty follow 
by performance under a different © 
dition, or the use of the AB in 
paradigm or an expansion of R eri- 
which each S undergoes each eh $ 
mental condition. Initial task the 
culty is usually assessed from the 
mean score for a number of trial 
score achieved on the final tria ‘red 
the mean number of trials requ! he 
to reach a certain criterion- aa 
method of estimating the gpa: 
transfer also varies between stu 


METHODS Usep To VARY 
Task DIFFICULTY 


Stimulus Variations 


. ck- 
Barch (3) using a following bare 2 


i 3 d Co 
ing task (Modified Two-Han b 
ordination Test) varied diffculty y 


= 
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changing target size. Using the same 
task Morin (17) varied difficulty in 
the same manner. A Rudder Control 
Test (Model CM120C) was employed 
by Gagné and Bilodeau (9). Varia- 
tions in task difficulty were intro- 
duced by changing the width of the 
on-target scoring area. In an experi- 
ment by Szasfran and Welford (23) in 
which Ss were required to throw 
loops of chain into a box the display 
was varied in three ways in order to 
obtain three levels of difficulty. Un- 
der the first conditions Ss threw di- 
rectly into the box, under the second 
they threw over a bar placed between 
S and target, and under the third 
condition a mirror was placed behind 
the box and a screen before it. Under 
the last condition it was necessary to 
use the mirror in aiming since the 
Screen blocked direct viewing. In 
this experiment variations in display 
led necessarily to changes in method 
of throwing (responding) according as 
to whether S threw directly or over 
the bar. Alterations in target speed 
in a following tracking task led to 
variations in task difficulty in an ex- 
periment by Lincoln and Smith (14). 

hanges in target speed necessitated 
Corresponding changes in speed of re- 
Sponding. In an experiment by An- 
dreas et al. (1) task difficulty was 
varied by altering the number of 
Moving elements in the display of a 
tracking task. This was accom- 
Plished by employing a following 
(SAM Two-Hand Pursuit Test) and 
a compensatory (SAM Two-Hand 
Coordination Test) task. Under the 
latter condition changes in display- 
Control linkage were also introduced. 
A motor-discrimination task in which 
difficulty was varied by changing the 
nature of discrimination training dur- 
ing the initial task was used by 
Gagné et al. (7). Position and color 
discriminations only were used dur- 
Ing the initial task and in the final 
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phase both kinds of discrimination 
were involved. 


Response Variations 


Gibbs (10) used a hand-wheel con- 
trolled compensatory tracking task 
in which difficulty varied as a func- 
tion of hand-wheel diameter and in 
which the smaller hand-wheel re- 
sulted in a more difficult task. An- 
other method of varying task diffi- 
culty used by Gibbs (10) was that of 
changing the complexity of the path 
to be followed in a following tracking 
task. It is conceivable, of course, 
that a more complicated path pro- 
vided greater perceptual difficulties 
as well as demanding more complex 
responses. A lever and a pressure 
control were also used to vary diffi- 
ficulty in another investigation by 
Gibbs (11) using a compensatory 
tracking task. The tracking task 
used by Baker et al. (2) was of the fol- 
lowing variety in which variations in 
gear ratios between hand-wheel con- 
trol and follower led to changes in 
the speed of movement of the fol- 
lower relative to hand-wheel turning 
rate. 

Variations in Control-Display Linkage 

In the experiment by Barch (3) 
above, task difficulty was varied by 
means of a complete and a partial re- 
versal of control-display relationship 
used in the standard form of the task. _ 
A change from a “natural” or 4 
pected” to an “unnatural” or ' 
expected” relationship resulted in an 
increase in task difficulty inan experi- 
ment by Gibbs (10). A following 
tracking task (Iowa Pursuit Appara- 
tus) in which difficulty was varied 
in four ways using standard, reversed, 
and two partially reversed display- 
control linkages was employed by 
Barch and Lewis (4). Under the 
condition of partial display-control 
linkage reversal either the left or 
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right hand linkage was reversed. It 
is interesting to note that results from 
this experiment are contrary to those 
obtained by Gibbs (10). Lincoln (13) 
varied task difficulty by using direct, 
velocity, and aided controls. 


RESULTS OF EXPERIMENTS 


At this point in the discussion it is 
as well to summarize briefly the gen- 
eral trend of the experimental results 
when task difficulty is varied in the 
numerous ways outlined in the previ- 
ous section. The effects of relative 
task difficulty on transfer when diff- 
culty is changed along a stimulus 
dimension are mainly negative when 
the stimulus alone is varied. In three 


culty in terms of target speed (14) 
also failed to give greater transfer to 
an easier task than did the Opposite 
Increasing task difficulty by 
changing from a following to a com- 
pensatory tracking task (1) did result 
in greater transfer when the difficult 


! The results from an experiment by Green 
published after the completion of this paper 
in which target size was varied over a wide 
range are in agreement with these previous 
findings. The task used was a following track- 
ing task. (Green, R. F., Transfer of skill on a 
following tracking task as a function of task 
difficulty (target size) J. Psychol., 1955, 39, 
355-370.) 
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task was practiced initially, but only 
when the second task had an “un- 
natural” control-display arrange- 
ment. This effect was not observed 
when the final task control-display 
relationship was “natural” but still 
less difficult than the first task. Inan 
aiming experiment (23) increasing 
the difficulty of the task by increas- 
ing the complexity of the display did 
result in greatest transfer in the diffi- 
cult to easy direction. In this experi- 
ment, however, the variations in the 
perceptual aspects of the task led to 
changes in the mode of responding. 
‘his makes it difficult to attribute 
the differential transfer effects to 
stimulus variation alone. Such an 
argument applies, of course, to other 
experiments in which variations 1n 
the stimulus situation may have 
given rise to unmeasurable changes 
in the mode of response. It does ap- 
pear, however, to be especially perti- 
nent in the case of the aiming experi- 
ment. Ina motor-discrimination task 
(8) transfer of training to a total sn 
Was greater when the more difficu 
of the two forms of discrimination 
Was practiced first. It must be borne 
in mind, however, that in this — 
ment transfer of training from t i 
components of a task to a total oe 
Comprising a combination of Sa 
two initial forms of the task was dealt 
with, rather than transfer from one 
task condition to another. The re- 
sults from this investigation may 
not be altogether comparable w 
other experiments in which th 
stimulus situation was varied. ä 
A greater degree of transfer sia 
a difficult to a less difficult condition 
than from a less difficult to a difficu 
condition js met with more consist 
ently when task difficulty is varie 
with respect to response variables, 
When the task was made more a 
cult by altering hand-wheel size a! 
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course complexity (10), method of 
control (11), and rate,of responding 
(2), greatest transfer resulted when 
this form of the task was practiced 
initially. 

Two experiments (10, 4) have 
dealt with the effects on transfer of 
task difficulty varied with respect to 
the spatial relationship between con- 
trol and display in tracking tasks. 
Whereas one of these investigations 
(10) has demonstrated that greater 
transfer occurred in the difficult to 
easy direction than in the opposite 
order, the other experiment (4) failed 
to note this effect. The fundamental 
differences in the design, extent of 
Control-display relationship reversal, 
nature of task, and number of con- 
trol devices must be considered as 
Possible reasons for the inconsistency 
between these two sets of results. 
Variation in difficulty as a result of 
altering the nature of the control-dis- 
Play linkage in tracking (13) also 
failed to affect transfer of training 
differentially with respect to the de- 
Sree of difficulty of initial and final 
tasks, 

It is necessary to direct attention 
to the fact that the transfer phenome- 
non under consideration here has 
been investigated in connection with 
only a limited number of factors 
which are known to affect the diff- 
culty level of a skilled task. Thus, 
with respect to response variables, 
such factors as hand-wheel inertia, 
friction, and aiding time-constants 
(12), relationship between display 
and control (15), control-crank ra- 
dius (22), handedness (21), and 
Planes of operation (18), have been 
Systematically investigated in rela- 
tion to the extent to which these fac- 
tors affect ease of control. In the case 
of stimulus variables, target dimen- 
Sions (15), visual magnification of 
target (12), structure of target sur- 


round (20), as well as others, have 
been examined in relation to task dif- 
ficulty. A systematic examination of 
such variables from the point of view 
of differential transfer effects in 
terms of the order of the difficult and 
easy conditions would be of consid- 
erable practical value, as well as a 
contribution to an understanding of 
the transfer process in skilled per- 
formance. 


Task DIFFICULTY 
The Isolation and Control of Variables 


Ordinarily the relative difficulty of 
two or more tasks is defined in terms 
of the magnitude of mean scores 
achieved during whole or part of the 
training session or sessions, or, in 
terms of the magnitude of the score 
reached on a final trial. These scores 
are then subjected usually to statisti- 
cal examination in order to establish 
the significance of differences between 
them. Should these differences prove 
to be significant, then that form of 
the task revealing greatest mean ac- 
curacy, least error, least time of per- 
formance, or some such, is said to be 
the least difficult. In transfer studies 
of the kind reviewed here, it is of im- 
portance not only to define opera- 
tionally the relative difficulty of the 
tasks, but to establish as far as possi- 
ble the source or cause of task diffi- 
culty differences. This is an essential 
step if a theory is to be constructed 
to deal with the effects on transfer 
of relative task diffculty. 

A problem which has become plain 
from the preceding review is the dif- 
ficulty of varying one task variable 
along a scale of difficulty without pro- 
ducing unmeasurable changes in 
other closely related variables. Even 
though the relative difficulty of the 
two tasks may have become obvious 
from an examination of performance 
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scores, the exact manner in which 
task difficulty arose, or to which task 
variable task difficulty can be attrib- 
uted, has not always been clear. In 
short, in many of the experiments 
discussed the locus of task difficulty 
has not always been clearly specifi- 
able. It should be remembered, how- 
ever, that the stimulus or response 
conditions probably never remain the 
same from trial to trial, but, as Os- 
good (19) has been careful to show, 
stimulus and response may retain 
identical functions during learning. 
Even allowing for this fact many of 
the experiments on the effects of task 
difficulty on transfer have failed to 
control adequately the source of dif- 
ficulty. 

In the experiment by Gibbs (10) 
in which a following tracking or 
“steering” task was used not only 
course complexity, but also the com- 
plexity of the responses demanded by 
the course, varied under the two 
conditions. The more difficult of the 
two tasks presented the operator with 
a task which required responses of 
greater complexity than the easier 
task, as well as with a stimulus situa- 
tion which may well have been more 
difficult perceptually. It is by no 
means easy in this case to state defi- 
nitely the separate contributions of 
each of these factors to the difficulty 
of the task, or to the extent of trans- 
fer. A similar problem is met with in 
the investigation of Lincoln and 
Smith (14) where variations in target 
speed may well have given rise to 
greater perceptual difficulty, as well 
as greater difficulty in responding. 
In the experiment by Szasfran and 
Welford (23) this same problem arises 
again. The “bar” condition not only 
demanded an alteration in the re- 
sponse by requiring the subject to 
throw over it, but changed as well the 
stimulus situation. It is not possible, 
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of course, to state whether or not the 
“bar” and “direct” conditions re- 
mained functionally identical while 
at the same time it is equally imposs!- 
ble to state definitely the locus of task 
difficulty. A similar argument applies 
when the greater difficulty of the 
“screen” condition in this investiga- 
tion is considered. The two task con- 
ditions employed by Andreas et al. 
(1) varied not only with respect to the 
number of display elements, but also 
in relation to course and response 
complexity and control-display rela- 
tionships. In this case it is not poss" 
ble to state the individual contribu- 
tions to task difficulty nor their poss 
ble interactions, all of which cou 

have affected the level of difficulty. 

A further kind of problem 1 the 
control of variables has arisen whe 
changes along one stimulus or 1°- 
sponse dimension have led to UW 
measurable variations along a closeY 
related dimension. For example, 
Baker et al. (2) altered the level © 
task difficulty by varying the rate ° 
hand-wheel turning necessary z 
move the target-follower through ig 
certain distance. These authors hav 
pointed out that changes in turning 
rate altered also the extent of move 
ment as well as the required force © 
movement, Variations in task di e 
culty could be due to any one of thes 5 
factors or to an interaction betwe® 
two or all of these. 

The principal problem, at O 
many of the studies reviewed 3 k 
state the locus of variations 1” bar: 
difficulty. Without doing this wi n 
fer of training of skill as a fune Re 
of the relative difficulty of heta 
is not easy to deal with theoretic =” 
since the actual experimenta 
dence remains obscure. The pro 
of isolation and control of ‘difficu 
variables in skilled tasks does Bi 
lend itself easily to solution, sin 


then, i” 
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stimulus and response factors are sO 
intimately related, and therefore ex- 
ceptionally difficult to isolate under 
experimental conditions. 


Task Difficult 
Standards j 


le E a number of skilled tasks the 
vel of performance required is fre- 
panny implied rather than de- 
ppa e by the task itself. This point 
ifs e made clearer by illustration. 
in a following tracking task the tar- 
i l in. in diameter, the target fol- 
of Fa % in. in diameter, and the task 
ar, operator is to superimpose the 
ih pee the former, a greater mar- 
TORA aaa is permitted than if both 
al oea elements are % in. 1n 
aie er. The extent of permissible 
ee is implied by the relative sizes 
= ee and follower. In this ex- 
ANH whereas holding the follower 
han a ae upon the small ( in.) 
diffe ee not be necessarily a more 
ine ‘ae task than keeping it on a 
haa target, the structure of the 
ae POA situation in the latter case 
Se that there is a wider margin 
ka in which the off-target extent 
v not count as an error. This 
a that the two conditions men- 
RET differ with respect to opera- 
nE ally defined “difficulty” insofar 
one condition demands higher 
Penang standards rather than a 
fick eh level of skill. e 
B t? task under these circum- 
inces, then, is the one which re- 
SA a higher standard of perform- 
a e by setting narrower error-toler- 
ance limits. The two task conditions 
a not necessarily differ, OF differ 
teva to a small degree, in the actual 
T ee of skill which they demand for 
ane performance. The 
7i es, sometimes large, between per- 
t mance curves for two task condi- 
ons differing in this manner may be 


and Performance 


The more An 


due primarily to the fact that the 
operator is directing to the task very 
little, or a great deal of effort, since 
the task implies by its target and 
follower dimensions a certain stand- 
ard of performance. The difficulty 
differences between the task condi- 
tions is more apparent than real. 

Szasfran and Welford (23) have 
suggested that one possible explana- 
tion of the phenomenon of greater 
transfer from a difficult to an easy 
task may be found in the higher 
standards of performance established 
during the difficult initial task, and 
carried over to the easier final task. 
Transfer in this case would be ex- 
pected to be greater than in shifting 
from an easy initial task to a more 
difficult final task. The available ex- 
perimental evidence does little to sup- 
port this hypothesis. In the experi- 
ments reported (9, 17) dealing with 
the relative dimensions of target an 
follower in following tracking tasks in 
which the task difficulty was varied 
by changing error tolerance limits, 
transfer was positive and about equal 
in going from difficult to easy an 
from easy to difficult conditions. In 
another experiment (3) greater trans- 
fer was found in shifting from difficult 
to easy, but it has been pointed out 
that control-display relationship fac- 
tors were confounded with stimulus 
factors in this study and the experi- 
mental design considered only the 
easy to difficult direction. 

The concept of error-tolerance im- 
plicit to the task as a determinant of 

erformance is not a new one. Mace 
(16) has put forward an hypothesis 
of “implicit standards” which is 
summarized in the statement 
u |, „ subjects aiming at targets de- 
fined to themselves a ‘good,’ ‘fair’ or 
‘poor’ shot not in terms of its absolute 
distance from the bull’s-eye, but in a 
way which was relative to the form of 
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target employed” (16, p. 103). A simi- 
lar view has been expressed by Hel- 
son (12) in his hypothesis of par or 
tolerance. 

The present indications are that 
when the difficulty of a task varies 
only in terms of the implied extent of 
error tolerated, neither condition call- 
ing for a higher level of skill, the phe- 
nomenon of greater transfer from the 
difficult to the less difficult condition 
than from the less to the more diffi- 
cult condition does not occur. So 
far this has only been observed in 
relation to the relative dimensions of 
target and follower in a tracking task. 
Much needs to be done with respect 
to both a variety of skilled tasks, and 
the various features of skilled tasks, 
before definite conclusions can be 
drawn or hypotheses clearly formu- 
lated. 


The U Hypothesis and Transfer 


Bartlett (5) and Helson (12) have 
each outlined hypotheses which state 
in effect that performance will re- 
main essentially the same over a cer- 
tain range of variation in the physi- 
cal characteristics of the task. Out- 
side this range performance tends to 
undergo considerable changes. Thus 
Bartlett states: “The fundamental 
features of performance will remain 
stable over a certain range of its con- 
ditions. Outside this range they will 
change often in a dramatic and radi- 
cal manner” (5, p. 444). Helson's 
statement is much the same: 


Human performance tends to be optimal 
as judged by accuracy, efficiency, and com- 
fort, over a more or less broad band of values 
for a given stimulus variable outside of which 
it becomes noticeably poorer. When perform- 
ance is plotted in terms of error or the recip- 
rocal of accuracy, the resultant curve is 
roughly U-shaped (12, p. 493). 


Helson has demonstrated such a 
curve for aiding time-constants, 
hand-wheel turning speed, and hand- 
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wheel size and inertia. . 

Employing this notion of the opti- 
mal band of performance Gibbs (10) 
has suggested that: 


... transfer between two tasks, one of 
which lies within the tolerance limits, and one 
of which lies outside this optimal zone can 
be (a) positive, (b) large, (c) unequal and 
unaffected by the order of task presentation; 
this implies greater transfer when the first 
task lies outside the optimal zone, and the 
second lies within it, than when the first task 
comes within and the second outside. 


This hypothesis lends itself readily 
to experiment, and since a consider- 
able amount of data is now available 
concerning the optimal band of pet 
formance for a number of task varia- 
bles, experiments designed to meas 
ure transfer from within to without 
and from without to within the op 
mal zone would be of great value. 

A single U curve deals only with a 
single variable thus emphasizing 
again the need for careful isolation 
and control of task variables in trans- 
fer experiments of this kind. sipe 
much of the evidence indicates tha 
the relative difficulty of initial ee 
final task conditions is a trans es 
determinant of considerable imp 
tance it is essential now to study r 
many factors varying in difñcu q 
and depicted by the U shaped curv x 
This approach would doubtlessly P"° 
vide data of fundamental importane? 
to practical and theoretical considera 
tions in the transfer of skill. 

In conclusion, one furthe 
quacy in the experimental d 
many of the experiments 50 f 
tioned needs to be pointed ou 
of the experiments have b to- 
signed so that only the difficult r 
easy, and easy-to-difficult task oe 
tions have been taken into cons} ae 
tion. It is again important, R 
both practical and theoretical V! fer 
points, that the degree of m 
from one condition to the same © 


r inade- 


esign O 
ar men- 


d 
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ea be observed as well. In 
= Po. eee by Baker et al. (2) 
of le — that the greatest degree 
on ae, er occurred when there was 
Pat erence in difficulty between the 
= ian final task. This means, 
ane o that greatest facilitation 
A a ce when the initial and final 
noria i the same task. This 1s 
aan ne rather than transfer 
"idee ince a large number of the 
failed ER outlined in this paper 
Sondi o include the experimental 
ale mee of easy to easy and diff- 
ts ha ifficult, it is not easy to gen- 
ditions ae the difficulty con- 
iste of initial and final task for 
aum transfer. 


SUMMARY 


This article has presented a sum- 
mary of a number of investigations 
concerned with the effect on transfer 
of training of the relative difficulty 
of initial and final tasks. The results 
from a number of recent studies are 
regarded as presenting an important 
problem for practical consideration 
and theoretical interpretation. The 
principal findings of these experi- 
ments have been briefly summarized. 
The concept of task difficulty has 
been discussed in relation to the isola- 
tion and control of task variables, 
subjective performance standards, 
and the U hypothesis. 
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EYSENCK’S TENDER-MINDEDNESS DIMENSION: 
A CRITIQUE 


MILTON ROKEACH anp CHARLES HANLEY 
Michigan State University 


In a series of articles and books 
(1, 2, 3, 4, 5, 6), Eysenck has tried to 
demonstrate that individual differ- 
ences in social attitudes are reducible 
to “two primary social attitudes.” 
The first of these he calls “‘radicalism- 
conservatism” and the second, 

tough-mindedness—tender-minded- 
ness.” Our primary concern in this 
Paper is with the empirical and theo- 
retical bases of the latter of these di- 
mensions. 

The origin of the two factors has 
been described by Eysenck: 


This scheme was derived by the writer from 
a factorial analysis of the responses to a 40- 
item attitude inventory made by 750 middle- 
class English subjects. . . . These were drawn 
equally from voters for the three major 
British political parties (conservative, liberal, 
and socialist), and the 250 subjects repre- 
senting each party were equated for age, sex, 
and education. It was shown that items hav- 
ing high factor saturations on radicalism- 
Conservatism (R) also distinguished at a high 
level of significance between voters for the 
Conservative and radical parties respectively, 
while items having low saturations on R failed 
to distinguish between the two parties. ... 
i The tender-mindedness factor (T) was 
ound to be quite uncorrelated with R, and 
to give no discrimination between the political 
parties. On analyzing responses of additonal 
samples of communist and fascist party mem- 
bers, however, it was found that both these 
groups were differentiated with considerable 
accuracy from members of the three demo- 
cratic parties by their low T scores; both 
communists and fascists were thus found to be 
tough-minded in comparison with conserva- 
tives, liberals, and socialists (3, p. 563). 


Eysenck describes the bipolar ten- 
der-mindedness—tough-mindedness 
factor primarily in terms of the items 

‘It should be understood that whenever 


this term is used, we follow strictly Eysenck’s 
own usage. 


which measure it (1). The reader 
may refer to the left-hand column of 
Table 1 for the 14 T items and the di- 
rection in which they are to be scored. 
In attempting to find an underlying 
concept corresponding to this factor, 
Eysenck uses William James's di- 
chotomy of tender-mindedness vs. 
tough-mindedness. Eysenck notes 
“one set of opinions appears to be 
dominated by ethical, moralistic, 
super-ego, altruistic values; the other 
by realistic, worldly, egotistic val- 
ues” (1, p. 61). 

An examination of Eysenck’s writ- 
ings reveals that he explains, or re- 
interprets, a great body of research 
on the organization of social attitudes 
and their relation to personality in 
terms of the R and T factors. Thus, 
a considerable portion of his recent 
book, The Psychology of Politics (5), 
is devoted to just this undertaking. 

In view of the importance which 
Eysenck attaches to the factor he 
labels tender-mindedness, and the 


far-reaching conclusions he draws re- — 


garding differences on this dimension 
among adherents to various political 
positions ranging from the Right to 
the Left, we felt that a careful scru- 
tiny of his data was in order. On the 
basis of our analysis, we have arrived 
at the opinion that the evidence 
Eysenck himself presents to support 
these conclusions contains grave con- 
tradictions and errors in computa- 
tion, and that the concept itself, at 
least as employed by Eysenck, con- 
tributes little or nothing to the scien- 
tific understanding of the social phe- 


nomena to which he addresses him- 


self. 
169 
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TABLE 1 


FREQUENCY OF RESPONSES BY CONSERVATIVES, LIBERALS, SOCIALISTS, AND COMMUNISTS 
TO THE [TEMS ON THE TENDER-MINDEDNESS SCALE 


Item and direction of scoring 


Class 


Proportion of “yes” answers 


Cons. | Lib. Soc. | Com. 
Itemsonwhichcommunistsare mosttender-minded: 

1. Coloured people are innately inferior to | Middle 42 ed -19 -00 
white people. (T—) Working -30 -33 -16 -06 

3. War is inherent in human nature. (T—) | Middle .67 57 34 02 

Working | 70 -67 -60 -02 

5. Persons with serious hereditary defects | Middle -69 .59 -63 -46 
and diseases should be compulsorily Working .96 .83 -89 a 
sterilized. (T—) 

8. In the interests of peace, we should give | Middle 30 -60 -76 74 
up part of our national sovereignty. Working ms -38 -50 65 
(T+) 

10. It is wrong that men should be per- | Middle 66 71 .80 :93 
mitted greater sexual freedom than Working „14 T8 -76 91 
women by society. (T+) 

= 

13. Conscientious objectors are traitors to Middle .28 .16 .09 02 
their country, and should be treated ac- | Working 67 me 27 -06 
cordingly. (T—) 

36. The death penalty is barbaric, and Middle 30 42 6+ “a0 

7 3 , Middl x ‘ # 3 
should be abolished. (T+) Working | :19 | iii | :20 | -83_ 

39. The Japanese are by nature a cruel peo- | Middle 58 37 19 0h 

ple. (T— Working | |74 dd 227 : 
Items on which communistsarem ost tough-minded: 

9. Sunday-observance is old-fashioned, | Middle .36 .44 -68 po 
and should cease to govern our behav- Working 59 33 -69 a 
iour. (T—) ý ` 

— 

15. Tlie, lawa against abortion should be Middle .28 40 53 et 
abolished. (T—) Working 33 Af Ak | _ 

16. Only by going back to religion can civi- Middle 65 56 36 09 
izati f eve Á 0S +o . .05 
lization hope to survive (T+) Working 74 61 wae Eis 

23. Divorce laws should be altered to make Middle -233 .42 -61 ‘a 
divorce easier. (T—) Working 37 22 «59! eae 

28. It is right and proper that religious edu- | Middle -66 55 32 ri 
cation in schools should be compulsory. Working -70 -78 1S ` 
(T+) earn 

29. Men and women have the right to find | Middle 35 40 62 a 
out whether they are sexually suited be- Working 37 c22 .36 ? 


fore marriage (e.g., by companionate 
marriage). (T—) 


ar 
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The basis for our opinion rests 
primarily on an examination of the 
data which Eysenck presents in two 
papers which form the foundation of 
his work with the T factor. In the 
earlier of these publications (1), 
Eysenck reports the factorial analy- 
sis alluded to earlier, and in the sec- 
ond (2), he presents data regarding 
the performance of middle- and work- 
ing-class members of the conserva- 
tive, tiberal, socialist, and communist 
parties. The mean values of T for 
these groups, as given by Eysenck 
(2), are shown in Table 2 under the 
heading “Original mean.’ 


TABLE 2 
MEAN TENDER-MINDEDNESS SCORES OF 
Members or VARIOUS BRITISH 
POLITICAL PARTIES AND SOCIO- 
ECONOMIC CLASSES 


| 
. Re- 
Orig- 
Party Class N | inal puted 
mean | inean 
Consery t- Middle 250 | 7.6 7.6 
tive Working 65 | 6.3 6.7 
Liberai Middle | 250 | 7.9 | 8.2 
Working 27 | 7.4 | 8.3 
Socialis. Middle | 250 | 8.0 | 8.0 
Working 45 | 6.2 | 6.6 
Communist | Middle 50 | 6.8 | 7.4 
Working 96 | 6.0 | 7.3 


There are 14 items on the T scale; 
the subject responds to each item 
with a ++, +, 0, —, or ——, ac 
cording to the degree of his agreement 
or disagreement with the content of 
the item. If the subject responds to 


“all items in the tender-minded direc- 


tion (e.g, ——, or —, to Item 1 in 
our Table 1), he obtains a score of 14. 
If, on the other hand, he answers all 
14 items either in the tough-minded 

2 Means for seven fascists were also pre- 
sented, unaccompanied by any breakdown of 


responses to individual items. Hence, we omit 
Consideration of this smal] sample. 


direction, or with ‘‘0,”’ he gets a score 
of zero on the scale. From Table 2 
it is seen that, according to Eysenck, 
the middle- and working-class com- 
munists are the least tender-minded 
of the four political groups. He indi- 
cates further than this difference is 
statistically significant. 

It is reasonable to suppose, in view 
of these means, that the communists 
would tend to show up as least ten- 
der-minded on each of the 14 items 
measuring the T dimension. For- 
tunately, Eysenck has presented (2, 
p. 203) the percentage of agreement 
with each of the 14 T items, broken 
down according to socioeconomic 
and political groupings. On inspect- 
ing these data, we found that the 
communists were the most tough- 
minded of the groups on six of the 14 
items, but were the most tender- 
minded. of all on the remaining eight 
items. These findings, furthermore, 
obtained for both the middle- and 
working-class samples. In our Table 
1, we have reproduced Eysenck’s 
data on this point, rearranging it into 
two sections: (a) items on which com- 
munists were the most tender- 
minded, and (b) items on which com- 
munists were the most tough-minded. 

It should come as no surprise to 
find that British communists are 
most tough-minded on six of the 14 
T items, if by tough-minded one 
means that they are opposed to Sun- 
day observance, favor the abolition 
of laws against abortion, are anti- 
religion, believe that divorce laws 
should be liberalized, are against 
compulsory religious education in the 
schools, and favor companionate 
marriage. Nor does it surprise us to 
find these same communists are most 
tender-minded on the remaining eight 
items, if by tender-minded one means 
that they reject the idea that colored 
disagree with the 


people are inferior, 
herent in human 


notion that war is in 
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nature, oppose compulsory steriliza- 
tion, are willing to give up national 
sovereignty, reject the double stand- 
ard, oppose the idea that conscien- 
tious objectors are traitors, favor the 
abolition of the death penalty, and 
reject the proposal that the Japanese 
are inherently cruel. The conclusions 
we draw from these results, how- 
ever, are considerably at variance 
from those Eysenck arrives at on the 
basis of differences in means on T (cf. 
“Original mean” in Table 2). Fur- 
thermore, we note Eysenck in coming 
to his conclusions makes no reference 
whatever to these data on the indi- 
vidual items. This oversight is in 
sharp contrast to the fact that he 
does not hesitate to point to such 
group differences in response to indi- 
vidual items as supporting the va- 
lidity of the radicalism-conservatism 
factor (1, p. 60-61; see also our quota- 
tion from Eysenck at the beginning 
of this paper). 

It is seen then, that while middle- 
and working-class communists are 
the most tender-minded of all the 
groups on eight of the 14 items, 
Eysenck’s mean scores on T show 
them as significantly more tough- 
minded than conservatives, liberals, 
and socialists. It was difficult for us 
to reconcile such findings; therefore, 
we proceeded to recompute the 
means for the various samples. This 
is easily done using the information 
contained in Table 1 on the percent- 
age of agreement with each item. The 
simplest way of doing this is to add 
the percentage values of tender- 
minded responses for all 14 items for 
a given group. It is necessary, of 
course, to keep in mind the direction 
of the scoring. For items on which 
disagreement indicates tender-mind- 
edness, the percentage frequencies 
must be subtracted from 100 per cent 
in order to get the correct value to 
use in recomputing the means. Thus, 
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the mean for the middle-class, con- 
servative group is recomputed by 
summing the following tender-mind- 
edness proportions obtained from the 
data in Table 1: (1, T—) .58; (3, T—) 
33; (5, T—) .31; (8, T+) .32; (10, 
T+) .66; (13, T—) .72; (36, T+) 
.30; (39, T—) .42; (9, T—) .64; (15, 
T=} .72; (16, T+) .65; (23, T=) 
.67; (28, T+) .66; and (29, T—) .65. 
Their sum rounds to 7.6, the mean 0 
the middle-class, conservative sain 
ple. The same value may be obtaine 
with a bit more labor by the following 
Procedure: (a) Find the number 0 
tender-minded responses to each item 
by multiplying each of the above es 
portions by 250, the number of 95 
in the conservative group; (b) sum 
these values; (c) divide this sum Dy 
250. X 
The values obtained from ore 
computation are shown in the 3 
computed mean” column of Kabig i 
A comparison of our means wi 
Eysenck’s reveals the following: er 
1.-In only two out of eight co 
parisons are the means identical. i 
2. Both middle- and working-¢ T 
communist means shift maraa 
toward the tender-minded direc ne 
(i.e., toward the values found for a 
other samples). In the case ott is 
middle-class communists, the shif a 
from 6.8 to 7.4; with the worl) 
class communists the change is eV 
more dramatic, from 6.0 to 7.3. lass 
3. The order of the middle-c 2 3 
groups in relation to tender ina i 
ness is unchanged, but men Ea 
ence between communists an! pe” 
servatives is indeed slight, 7- 
7.6. 30 
4. With the working-class grea 
the recomputed means indicate A 
the communists are more ees of 
minded than either conservative’ iy 
socialists, a drastic change 
Eysenck’s original finding. 


crep“ 
It is conceivable that the disc 
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ancies appearing in Table 2 arise be- 
cause our method of scoring diverges 
in one respect from Eysenck’s. He 
always scores ‘‘0"’ responses as tough- 
minded, while we, because Table 1 
gives only frequency of agreement, 
are forced to treat ‘‘0’’ answers to 
certain items (e.g., Item 1 and all 
others scored in the negative direc- 
tion) as tender-minded. We do not 
believe that this difference in scoring 
technique accounts for the discrep- 
ancies. Assume first that the fre- 
quency of “0” responses is the same 
in all political groups. Then, the re- 
computed means should (a) be uni- 
formly higher than those Eysenck 
reports, and the magnitude of in- 
crease should be approximately the 
same for all samples; and (b) the 
order of the groups on the T dimen- 
sion should remain the same. Neither 
of the preceding occurs. With respect 
to a, in the middle-class groups the 
recomputed means for the conserva- 
tives and socialists show no change 
at all; further, the magnitude of in- 
Crease in means is greatest in both 
middle- and working-class groups for 
the communists. With respect to b 
above, the order of the groups is strik- 
ingly altered in the working-class 
samples. Thus, it is clear that the dis- 
crepancies cannot be explained if we 
assume equal frequencies of ‘0” 
responses. 

Another possibility comes to mind. 
Perhaps the communists show the 
greatest increases in recomputed 
means because they give relatively 
more “0” responses than do the other 
groups. This could account for the 
discrepancies. There is only indirect 
evidence bearing on this point, and it 
leads us to doubt that such is the 
case. Eysenck states that extreme 
responses to the items are more com- 
mon in his communist samples. Thus, 
the communists have “a greater 
tendency to believe strongly in the 
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correctness of the attitude held” (5, 
p. 140). We must conclude, there- 
fore, that the discrepancies between 
our means and Eysenck’s cannot be 
explained away by appealing to the 
manner in which we have had to 
handle “0” responses. 

In view of the foregoing analysis, 
Eysenck’s continued contention (2, 
3, 4, 5) that communists are more 
tough-minded than conservatives, 
liberals, and socialists, is not sup- 
ported by his published data. 


THE NATURE OF THE TENDER- 
MINDEDNESS ITEMS 


We do not consider it constructive 
to terminate our analysis at this 
point. What conclusions can be 
validly drawn from these data? 

When one plots the factor satura- 
tions given by Eysenck (1) for the 
14 items, it is immediately apparent 
that a rotation of approximately 45 
degrees produces a striking increase 
in the number of items with near- 
zero saturations on one of the two 
rotated factors. Inspection of the 
items with high positive or high nega- 
tive saturations on these rotated 
factors indicates clearly that two 
kinds of content are involved which 
are strikingly similar to Ferguson's 
factors of “‘religionism”’ and ‘human- 
itarianism” (7) and Kirkpatrick's 
dimensions of “religiosity” and “‘hu- 
manitarianism” (8). v 

The positive pole of the “religion- 
ism” factor is indicated by Items 16 
(going back to religion) and 28 (com- 
pulsory religious education), the neg- 
ative pole by Items 9 (Sunday observ- 
ance old-fashioned), 15 (abolish laws 
against abortion), 23 (liberalize di- 
vorce laws), and 29 (companionate 


ms on Eysenck’s 
ed in this way. 
ed factors 
nt for 


3 The remainder of the ite 
questionnaire also were plott 
The interpretation of the two rotat 
in no way needs to be altered to accou 
the content of these additional items. 
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marriage). It develops that the six 
T items on which communists are 
the most “‘tough-minded” are just 
these six ‘‘religionism’” items. The 
communists take a clearly antireli- 
gious position. 

The positive pole of the ‘“humani- 
tarianism’’ factor is established by 
Items 8 (give up national sover- 
eignty) and 36 (abolish death pen- 
alty), the negative pole by Items 1 
(colored people inferior), 3 (war in- 
herent in human nature), 13 (consci- 
entious objectors are traitors), and 
39 (Japanese cruel by nature). One 
additional item, 10 (oppose double 
standard), has a moderate, positive 
“humanitarianism” saturation, while 
Item 5 (compulsory sterilization) re- 
ceives a moderate, negative satura- 
tion. It turns out that these are the 
eight items on which the communists 
score as the most tender-minded of 
the various samples. 

These rotated factors make it far 
more understandable why the com- 
munists score the most tender- 
minded on eight of the:14 T items. 
All eight are saturated with ‘“human- 
itarianism.”’ It is generally known 
that communist ideology supports 
the attitudes expressed by the posi- 
tive pole and opposes the attitudes 
expressed by the negative pole of this 
factor. Similarly, it is easily under- 
standable why the communists score 
as the most tough-minded on the re- 
maining six T items. All six are sat- 
urated with “religionism.” Again, 
it is consistent with communist ideol- 
ogy to agree with statements unfav- 
orable to religion and to disagree with 
statements based on religious doc- 
trine. 

When scores on the rotated factors 
are computed for the various groups, 
using the percentages given in Table 
1, the communists are highest on the 
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“humanitarianism’” factor, followed 
in order by socialists, liberals, and 
lastly, by conservatives. Conversely, 
on the “religionism’” factor, the 
conservatives score highest among 
middle-class samples, followed in 
descending order by liberals, social- 
ists, and finally, communists. The 
order in the working-class samples 15 
similar, except that liberals score 
higher on “religionism' than do con- 
servatives. Furthermore, the order 
of the frequency of responses of the 
four middle-class political groups tO 
all but two items corresponds exactly 
to the positions of these parties On 
the Right-Left political axis (c 
Table 1). A strong tendency of a sim 
ilar sort is apparent in the working- 
class groups, but it must be remem- 
bered that sampling errors are larger 
here because of the smaller number a 
subjects involved. Thus, most, if n0 
all, of the items on the tender-minc- 
edness scale are clearly related tO 
political affiliation. ; 

The preceding reanalysis , pi 
Eysenck’s data in terms of “religio s 
ism” and ‘“‘humanitarianism”’ man 
neatly with the earlier research 3 
Ferguson (7) as well as that by so 
patrick (8). Eysenck did not menti a 
Ferguson’s 1941 paper in his origini 
publications on the R and T facto n 
(1, 2), but in The Structure of ae 
Personality (4) and in The Eel 
of Politics (5) he raises the questio” 
as to whether “radicalism-conser Y, 
tism” and “tough-tender-min¢ n’s 
ness” are superior to Fergun a 
“religionism” and “humanitar t 
ism” in accounting for the, Ei 
Eysenck decides in favor of his flow 
dimensions. It is instructive to ee 
his reasoning as it applies to the Tica > 
lem. First, he argues that “ra E 
ism” and “tender-mindedness e 
to be preferred on the grou’ 


of 
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“semantic convenience,” ‘‘ More con- 
vincing,’’ Eysenck continues, ‘‘would 
be experimental evidence showing that 
Tough-mindedness had correlates in 
other fields, such as, for instance, in the 
field of personality, which neither 
Religionism nor Humanitarianism 
possessed. A proof of this type will be 
altempted in a later chapter, and the 
reader is asked to suspend his judg- 
ment until then..." (5, p. 147).4 

Such evidence would indeed be in- 
structive. We made a careful search 
of the remainder of The Psychology of 
Politics for this promised experi- 
mental test. Our search was in vain. 
The issue is never again raised in the 
book. This sort of treatment, of 
course, can hardly be considered an 
adequate resolution of the problem 
and serves only to reinforce the con- 
clusions we have reached. 


SUMMARY 


This paper is an evaluation of 
Eysenck’s research on the factor 
he calls tough-mindedness-tender- 
mindedness. Using a 14-item T scale 
dealing with such diverse religious 
Issues as Sunday observance, abor- 
tion, divorce, companionate mar- 
riage, etc., and with such other social 
issues as race differences, the cruelty 
of the Japanese, compulsory steriliza- 
tion, the double standard, and con- 
scientious objectors, Eysenck reports 
mean scores which indicate that mid- 
dle- and working-class British com- 
Munists are more tough-minded than 
middle- and working-class British 
conservatives, liberals, and socialists. 
_ Our analysis of Eysenck's pub- 
lished data clearly contradicts his 
findings and conclusions. The major 
Points considered were: 

1. Contrary to Eysenck’s conten- 


* Italics ours. 
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tion that communists are the most 
tough-minded, we find that on 8 of 14 
items both middle- and working-class 
communists were the most tender- 
minded of all the political groups. 
Eysenck overlooks these data in ar- 
riving at his conclusions. 

2. Our recomputation of the mean 
tender-mindedness scores, from data 
presented by Eysenck, strongly sug- 
gests serious errors in the original 
calculation of his means. 

3. The corrected means and our 
analysis of Eysenck’s data on fre- 
quency of agreement to individual 
items necessitate a serious modifica- 
tion of his conclusions regarding dif- 
ferences in tender-mindedness among 
the various political groups. 

4. The 8 (out of 14) tender-mind- 
edness items on which the commu- 
nists were the most tender-minded of 
all parties turn out to involve content 
which corresponds closely to Fergu- 
son's (7) and Kirkpatrick's (8) di- 
mensions of humanitarianism. 

5. The remaining six items on 
which the communists were the most 
tough-minded all pertain to a reli- 
gious dimension, also found by Fergu- 
son and by Kirkpatrick. 

6. Consistent with common knowl- 
edge, Eysenck's conservatives score 
the highest on the religious items, 
followed in order by liberals, social- 
ists, and communists. Communists 
score the highest on the humanitar- 
ianism items, followed in order by 
socialists, liberals, and conservatives. 

Our analysis leads us to the con- 
clusion that tough-mindedness-ten- 
der-mindedness, as conceived and 
measured by Eysenck, has no basis in 
fact. It is based on miscalculations 
and a disregard for a significant por- 
tion of his data. It conceals rather 
than reveals the attitudinal differ- 
ences existing among political groups. 
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In the Psychology of Politics (11) 
and in a number of earlier articles and 
papers the writer has tried to do three 
things. In the first place, he has 
tried to construct a dimensional 
framework to deal with the interrela- 
tions obtaining between a wide va- 
riety of different social attitudes. The 
results of several experiments and 
analyses, carried out in different coun- 
tries and on different samples, led to 
the hypothesis that these relation- 
ships could be described with consid- 
erable accuracy in terms of two or- 
thogonal (independent) factors, la- 
beled radicalism-conservatism (R fac- 
tor) and tough-mindedness-tender- 
mindedness (T factor). No attempt 
Was made, as Rokeach and Hanley 
(16) claim, “to demonstrate that in- 
dividual differences in social attitudes 
are reducible to ‘two primary social 
attitudes’; such a reduction would 
fail to take account of the specific 
Part of the variance, which is con- 
siderable, and could not be effected 
by the use of the factorial method, 
on which our conclusions were based. 

In the second place, an attempt 
was made to follow up a hypothesis 
formulated quite early in the history 
of this research (5), to the effect that 
the T dimension was correlated with 
certain personality variables, while 
no such correlation was postulated 
for the R dimension. The specific 
hypothesis tested was that intro- 
verted people would tend to be ten- 
der-minded, while extraverted people 
would tend to be tough-minded. In 
this connection, the hypothetical 
constructs “introversion’” and “‘ex- 
traversion” are used in terms of the 


operational definition given them in 
Dimensions of Personality (7), The 
Scientific Study of Personality (9), 
and The Structure of Human Per- 
sonality (10). 

In the third place, an attempt was 
made to link up both the attitude 
dimensions and the personality stud- 
ies with the main body of modern 
psychology by showing that the re- 
suits found in our experiments could 
be deduced from certain postulates 
of learning theory, and that in this 
way the particular structuring of 
variables observed could be explained 
by reference to a larger body of well- 
known facts. The claim is made in 
the Psychology of Politics (11) that 
these three aims have been accom- 
plished to a reasonable approxima- 
tion. In view of the fact that if this 
claim could be substantiated the work 
reported would be of some interest to 
social psychologists concerned with 
the integration of their field of study 
with that of general and experi- 
mental psychology, well-considered 
criticism showing possible weak- 
nesses in the chain of proof is wel- 
comed by the writer, as this would 
make possible the design of more con- 
vincing experiments, Or lead to a 
more accurate restatement of the 
theory. It is to be regretted that the 
critique by Rokeach and Hanley 
(16) does not seem to be related 
closely enough to the facts of the 
case to be useful from this point of 
view. 

Their first point of criticism ap- 
pears to be that in one paper (8) the 
writer concluded that the communist 
groups tested had low scores on ten- 


177 


a age. ey 


178 


der-mindedness; this they claim to be 
an error based on miscalculation. 
Computational errors do, of course, 
occur even when considerable care 
is taken. The writer does not believe 
that any such errors occurred in this 
case, for three reasons. In the first 
place, computations were done with 
all the usual checks, and were then 
repeated independently; identical re- 
sults were obtained the second time. 
This does not conclusively eliminate 
the possibility of computational er- 
rors, but makes their occurrence 
rather less likely. 

In the second place, the argument 
presented by Rokeach and Hanley in 
favor of their view is a very indirect 
one, as the published article does not 
contain enough detail to make ac- 
curate computation possible. As they 
themselves admit, in discussing the 
“0” responses, “there is only indirect 
evidence bearing on this point: sy 
It is, in fact, impossible to argue back 
from the published figures in the way 
that Rokeach and Hanley are doing, 
and no rigorous development of their 
criticism is indeed attempted. When 
they say of their “recomputations’’ 
that “in only two out of eight com- 
parisons are the means identical,” it 
should be clearly understood that 
this is quite irrelevant as their re- 
computations leave out part of the 
data. The fact that the means in two 
cases are identical is purely fortui- 
tous; there is no reason why any of 


1 Even this “indirect evidence” of theirs is 
based on curious reasoning and factual inac- 
curacies. Thus Rokeach and Hanley say: “He 
[Eysenck] always scores “o” responses as 
tough-minded....” This js quite untrue, 
Several different scoring schemes have been 
tried out at various times, such as the one 
mentioned in the 1947 paper (6, p. 65). The 
work of Melvin (15) has contributed greatly 
to a final decision on the best method of deal- 
ing with the problem of the “o” response. 
Any recomputation based on false assump- 
tions of this kind must be regarded as irrele- 


vant. 
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the means should be identical. 

In the third place, Rokeach and 
Hanley have been very partial in 
their selection of evidence. They 
say that “in view of the foregoing 
analysis, Eysenck’s continued con- 
tention that communists are more 
tough-minded than conservatives, 
liberals, and socialists, is not sup- 
Ported by his published data.” Yet 
in the Psychology of Politics (11, P- 
141), there is given a detailed dia- 
gram of the scores made by com- 
munists and fascists, as compare 
with a group of matched subjects 0 
conservative, liberal, and socialist 
attitudes; this diagram bears out 
completely the conclusion criticize 
by Rokeach and Hanley. The figures 
on which it is based, contained in @ 
doctoral dissertation by Coulter (4), 
were available to at least one of che 
two critics, and the published dias 
gram gives sufficient detail to a 
that this independent research su fi 
stantiates the contention that cm 
munists are more tough-minded ce 
people supporting other ee 
Parties (with the exception of ‘a 
fascists). The failure to mention t = 
corroborative evidence is difficult 
explain, p 1 

Equally important in this on 
tion is another research, comple 
only recently, and as yet unpublish 3 

his study by Nigniewitzky meses 
not have been known to Rok 
and Hanley, but the results are Kes 
relevant to the question of whe g 
the original data can be duplicar 
in repeated and independent Sete 
Basing his study on a properly $ otis 
fied sample of the French popula, ne 
and using a slightly modified an ne 
proved form of the T scale, Nig nid 
witzky found that communists a 
a mean score of 10.3; fascists, ha el- 
mean score of 10.2; communist | at 
low-travellers had a mean scor rers 
10.2. The mean score of suppor 
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of all the other main French parties 
was 17.6! These figures are even 
more impressive than those found in 
England; they strongly support our 
view regarding the position of com- 
munists in the two-dimensional factor 
space. 

One further result from Nignei- 
witzky’s study may be of interest. 
He found that in an analysis of vari- 
ance carried out over the main politi- 
cal Parties in France, the score on the 
T dimension gave an even better dif- 
ferentiation than did the R score (in 
Anglo-Saxon countries the opposite 
is usually found). Other scales, such 
as the F scale, which bears consider- 
able similarity to the T scale, and 
Correlates reasonably highly with it 
in most studies, were very much in- 
ferior to both the T and the R scales. 
These facts, added to those reported 
In the Psychology of Politics may serve 
as an adequate comment on Rokeach 
and Hanley’s contention that “tough- 
mindedness-tender-mindedness,’ as 
Conceived and measured by Eysenck, 
has no basis in fact. It is based on 
miscalculation and a disregard for a 
Significant portion of his data. It con- 
Ceals rather than reveals the attitudi- 
nal differences existing among politi- 
cal groups.” 

All in all, then, our answer to 


_* Historically the T scale was published 
Several years before the F scale. The T dimen- 
Sion was isolated in 1944 (5), and the scale 
Fishes in 1947 (6). The F scale was pub- 
Sa in 1950 (1), without mention of the T 
Nale in spite of the obvious similarities. 

either was Ferguson’s (13) contribution 
mentioned, which also is very relevant to the 
Concepts underlying the F scale. Rokeach 
suc Hanley take the author to task because 
a ‘did not mention Ferguson's 1941 paper 
es his original publications on the R and T 
actors,” They omit to add that in an even 
eee paper, not quoted by them at all, the 
writer (5) bad thoroughly and in detail dis- 
Cussed the contribution not only of Ferguson 
(12), but also of Carlson (13), Thurstone (17), 
and many others. 


Rokeach and Hanley is that proper 
care was observed in the calculation 
of the data; that their criticism is not 
based on rigorous calculation, but on 
argument and surmise; and that two 
independent repetitions of the study, 
one of which was known to Rokeach 
and Hanley, give results even more 
striking in their support of our hy- 
pothesis than did the original study 
investigated by Rokeach and Hanley. 

Allied to the criticism regarding 
the alleged computational errors is 
Rokeach and Hanley’s discussion of 
the detailed results of the 1951 paper. 
They take the writer to task because 
“in coming to his conclusions (he) 
makes no reference whatever to these 
data on the individual items.” This 
is the first time the writer has been 
criticized for obeying Rule 1.22, Sub- 
section d, of the APA Publication 
Manual (2), which reads: “Data 
should be presented no more than 
once. Although it is appropriate to 
refer to tabular data in the text of an 
article, care should be taken not to 
repeat data unnecessarily in the sec- 
tion on results, in the discussion, and 
in the summary.” The tabular pres- 
entation was sufficiently detailed for 
Rokeach and Hanley to draw conclu- 
sions from it at considerable length; 
no editor would have permitted the 
writer a discussion of similar length 
in addition to the tabulation. How- 
ever, the main point of their discus- 
sion indicates that Rokeach and Han- 
ley fail to understand the chief char- 
acteristic of dimensional analysis. 
Communists as a group have loadings 
on two orthogonal factors; _conse- 
quently their responses to individual 
items are determined not only by 
their tough-mindedness, but also by 
their radicalism. Items relating to 
anti-Semitism, war attitudes, the 
death penalty, and so forth should be 
answered in the affirmative because 
of their loading on tough-mindedness. 
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but in the negative because of their 
loading with radicalism; the outcome 
of the ensuing conflict will depend on 
the respective loadings, as well as on 
the exact position of each person in 
the communist group on the two fac- 
tors. The T score combines in equal 
proportions radical and conservative 
items and thus gets rid of the compli- 
cation introduced by the R factor; 
in just the same way the R score, 
combining in equal porportions 
tough-minded and tender-minded 
items, gets rid of the complications 
introduced by the T factor. This 
point appeared too obvious and in- 
deed elementary to discuss at length 
in the paper; the reader interested in 
the detailed construction of the 
scales, and the problems encountered, 
may be referred to a separate publica- 
tion by Melvin (15). 

We may now turn to the second 
major criticism presented. In dis- 
cussing the similarity between his 
dimensional scheme and that pre- 
sented by Ferguson (12, 13), the 
writer (11, p. 147) has commented 
that a rotation of 45° would turn the 
one pair of reference axes (T and R) 
into the other (humanitarianism and 
religionism). There is an obvious 
semantic convenience in employing 
widely used and accepted terms, 
such as radicalism-conservatism, par- 
ticularly when there is evidence that 
the scale for measuring such a factor 
coincides with the actual major politi- 
cal party groupings (6). Further- 
more, it seems more reasonable to 
refer to communists as “tough- 
minded radicals,” or to fascists as 
“‘tough-minded conservatives,” than 
to refer to conservatives as “religious 
antihumanitarians,” or to socialists 
as “nonreligious humanitarians,” as 
we would have to do if we accepted 
the Ferguson scheme. Indeed, this 
rechristening seems to lead to a re- 
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ductio ad absurdum when we find 
Rokeach and Hanley arguing that 
“communists score the highest on the 
humanitarian items.” To find com- 
munists considered as the | most 
“humanitarian” group of all is cer- 
tainly a little startling! į 
However, this argument regarding 
the superiority of the R and T dimen- 
sions on the basis of semantic con- 
venience was only used by the au- 
thor in a very subsidiary way. AS 
pointed out in The Psychology af 
Politics, in a passage quoted by 
Rokeach and Hanley, ‘‘more con- 
vincing would be experimental a 
dence showing that Tough-minded- 
ness had correlates in other fields, 
such as, for instance, in the field 9 
personality, which neither Religion 
ism nor Humanitarianism possessrt 
A proof of this type will be attemp 
in a later chapter...” (11, p- 14 i 
Rokeach and Hanley commen $ 
“Such evidence would indeed be 1” 
structive. We made a careful seni 
of the remainder of The Psycho si 
of Politics for this promised exper 
mental test. Our search was 11, Wie 
The issue is never again raised in ana 
book.” The writer finds this CO 
difficult to understand. A Ra 
chapter, entitled Ideology and ‘ ae 
perament, is given over to a es 
sion of the experimental evidence d 
lating to this problem, and sev A 
different approaches are reporte that 
of which support the hypothesis fon 
tough-mindedness and extra ee 
are related to each other, as vee ze 
by our hypothesis. The reader Fig- 
tention is drawn particularly 2. ya 
ure 30, on p. 178 of The Psycho on 
Politics, which reports the resu at ate 
tained by George (14) in adar seen 
tack on this problem. It will be A 
there that his measure of extrave™ 
is situated almost exactly om one 
tough-minded factor axes. Any 
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familiar with dimensional analysis 
will be able to see for himself the re- 
sult of rotating the axes through an 
angle of 45°, thus bringing them in 
line with the Ferguson system. This 
would considerably reduce the cor- 
relation of extraversion from its pres- 
ent reasonably high size, and would 
leave us with two rather low and un- 
important correlations with religion- 
ism (negative) and with humanitar- 
fanism (negative). Furthermore, the 
relation between extraversion and 
tough-mindedness observed in this 
Sina was predicted in terms of 
€oretical considerations; no such 
prediction was made to our knowl- 
f ge with respect to Ferguson’s two 
actors. Rokeach and Hanley’s fail- 
ae to see the relevance of this whole 
apter, and of this study in particu- 
ar, to the point in question is diffi- 
cult to understand. 
f ees also fail to take into account 
What to the writer is the most im- 
Portant chapter in the whole book, 
yiz., the concluding chapter entitled 
3 Theory of Political Action.” Here 
A attempt has been made to deduce 
; e actual structure of attitudes 
ound, as well as the relationship of 
the T factor to extraversion-introver- 
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sion, from general learning theory; 
it was also deduced that there should 
be no consistent relationship be- 
tween the R factor and the main per- 
sonality variables. None of the rela- 
tions pointed out in this chapter, 
and none of the deductions made, 
would be applicable to the Ferguson 
factors. Rokeach and Hanley do not 
mention this argument, although to 
the writer it appears the most 
cogent one in coming to a decision be- 
tween the two rival schemes. This 
failure to come to grips with the 
writer's theory as a whole appears to 
him the outstanding weakness in the 
critique to which this is the reply. 
The authors have quite arbitrarily 
picked out certain isolated points, 
have disregarded the great mass of 
evidence supporting each separate 
conclusion, as well as the intercon- 
nections between the different parts 
of the research under review, and 
have come to conclusions which are 
not in fact borne out by a careful 
perusal of the evidence. The reader 
will be able to form his own opinion 
after comparing the facts as outlined 
in The Psychology of Politics with 
Rokeach and Hanley’s critique. 
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oir pon reaction to Eysenck’s 
> a ( ) is that it evades or beclouds 
ach of the specific issues we raised 
concerning his work on “tender- 
aol To our mind, the very 
nr pe i Epy is misleading. It is 
‘i te J sychology of Politics (5) we 
have criticized. The reader whose at- 
tention is directed to this book will 
sn be able to see for himself the seri- 
= compatananal errors, the omis- 
T p contradictory evidence, and 
e erroneous conclusions we have 
pointed to in our critique.* Eysenck 
republishes in this book a good deal, 
but not all, of the data he presented 
in earlier journal articles. Most con- 
spicuous is the omission of his data 
i responses of communist and 
or 3 ng-class samples to the individ- 
ae re of his scale. He republishes 
nn aa only for middle-class con- 
a ives, liberals, | and socialists. 
5 precisely the omitted data which 
show that the communists are more 
ae ermined" than conservatives, 
ain and socialists on 8 of the 14 
items, and it is these same data that 
e us to recompute mean Ẹ 
T res for his various groups and 
ereby discover that the means he 
reports are incorrect. It is for this 
a we felt it necessary to 
: rutinize Eysenck’s earlier and, by 
ar, fuller reports (3, 4), rather than 
the data Ensenck has chosen to pre- 
sent in his book. 
Now let us examine Eysenck’s re- 


1 é 
Te of this book reveals shortcom- 
sidered Meee other than those we have con- 
separate any of these will be dealt with in a 
Christie i critique by Professor Richard 
stie in a forthcoming issue of this journal 


(1). 


ply to each of our criticisms. 
Concerning responses to individual 
T items. Eysenck states that by 
“dimensional analysis” it is possible 
to reconcile the fact that the com- 
munists, whom he describes as being 
tough-minded, turn out to be the 
most tender-minded of any group on 
8 of the 14 T items. His reasoning 
runs somewhat as follows. Each of 
the 14 items not only measures indi- 
vidual differences along the tender- 
mindedness axis but also along the 
radicalism axis. Suppose that agree- 
ment with a specific item is scored 
both as tender-minded and as radical, 
and that the subject is a communist. 
Eysenck attributes such agreement 
to radicalism and not to tender- 
mindedness. He does not, however, 
consistently apply this line of thought 
to the other items. Thus, suppose 
that agreement with an item is score! 
both as tough-minded and radical, 
and the subject is a communist. If 
Eysenck were to follow his rule, such 
agreement should be attributed to 
radicalism and not to tough-minded- 
ness. This he does not do. Instead, 
he attributes such agreement to the 
operation of tough-mindedness. Sup- 
pose, again, that agreement is scored 
both as tender-minded and conserva- 
tive, and that the subject is a con- 
If Eysenck were consist- 
d interpret such agree- 
due to conservatism, 
der-mindedness. This 
ch agreement is at- 
conservatism an 


servative. 
ent, he shoul 
ment as being 
and not to ten 
time, however, su 
tributed to both 
tender-mindedness. ; 

We cannot escape the impression 
that following Eysenck’s line of argu- 
ment permits one to shift one’s ex- 
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planation of results from one axis to 
the other according to which axis one 
wishes to grind. 

Eysenck asserts also that the rea- 
son he never referred to the contra- 
dictions in the item response frequen- 
cies is that he was prevented from 
doing so because he was “obeying 
Rule 1.22, Subsection d, of the APA 
Publication Manual,” which is to the 
effect that data should not be re- 
peated unnecessarily. 

We find ourselves hesitant to take 
this explanation seriously. We will 
take space here only to point out that 
the APA Publication Manual 
Eysenck refers to was published in 
1952. The article containing the 
contradictory findings appeared in 
1951—in the British Journal of So- 
ciology! 

Concerning Eysench's incorrect 
means. When we recomputed means 
for Eysenck’s groups from the data 
he presents on response frequencies to 
items, we found serious discrepancies 
between his means and ours. We 
suggested that his means were in er- 
ror to such an extent that his con- 
clusions regarding differences in 
tough-mindedness among commu- 
nists, socialists, liberals and conserva- 
tives had no basis in his data, 
Eysenck replies “computations were 
done with all the usual checks, and 
were then repeated independently.” 
He denies the correctness of our re- 
computed means because, he says, it 
is “impossible” to do so using the 
response frequencies. These recom- 
putations, he continues, are based on 
“false assumptions” and are “‘irrele- 
vant,” because we do not know how 
the “0” responses were scored, We 
are referred to the work of Melvin 
who “‘has contributed greatly to a fi- 
nal decision on the best method of 
dealing with the problem of the ‘0’ 
response.” f 

Several comments are in order here. 

1. The most convincing way to 
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demonstrate the fallaciousness of a 
critic's recomputations would be to 
make available the raw data and de- 
tailed instructions on how they are 
to be scored. Eysenck has made no 
effort to do this. : 

2. Wedo not believe that Eysenck’s 
conclusions are worthy of serious at- 
tention if different ways of scoring 
“0” responses can make such whop- 
ping differences in findings. 

3. Eysenck himself describes on 
page 276 of The Psychology of Politics 
exactly how Melvin scores the 1 
scale. As far as one can tell from 
Eysenck’s description (5, p. 65), B 
original scoring method is identica 
with Melvin’s. 

4. However the “0” responses may 
have been handled, the crucial data in 
Eysenck’s table of response freas 
cies to individual T items are alrea y 
classified into proportions of he 
answers, and the accompanying tex 
describes the scoring system in Mer 
of “Yes” answers to items (4, pp- 20 4 
203). Thus, if his means fail to cor 
respond with the means which any 
one may calculate from the data pai 
sented in this table, Eysenck’s mean 
must be incorrect. i the 

5. We have noted previously ra 
carelessness which Eysenck has acl 5 
in presenting factual material. Le iia 
note two further examples from 
reply. ? 

First, Eysenck gives in his reply 
the year 1955 as the date of Meri 
Ph.D. thesis. In The Psychology 3 
Politics reference is also made to ee 
thesis. On page 276 the date ihe p 
1954; and on page 301 the date y 
1953! (We note too that ae A 
refers to this unpublished aie ei 
as “a separate publication by 

vin.” $ S 

Sahi Eysenck writes in his oe 
ply: “Rokeach and Hanley a not 
author to task because he ‘did his 
mention Ferguson’s 1941 paper a T 
original publications on the R a 
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factors.’ They omit to add that in 
an even earlier paper, not quoted by 
them at all, the writer (5) had thor- 
oughly and in detail discussed the 
contribution not only of Ferguson 
(12), but also of Carlson (13), Thur- 
stone (17), and many others.” 
pod careful consideration of this 
ion ment shows that ‘‘Ferguson’s 
paper” and the “Ferguson (12)” 
me are not the same paper! The 
t erguson (12)” cited by Eysenck 
urns out to be another study by Fer- 
mon published in 1939.2 We suggest 
that if one now re-reads the above 
A aion it will have an entirely dif- 
erent meaning! 
AE two examples demonstrate 
a eleennees by Eysenck in 
TE ing and presenting factual 
material. Application of some of the 
peel “sie 7 which Eysenck states 
is mployed in calculating his means, 
ould also have prevented these er- 
rors from occuring. 
1 pe pd the independent repeti- 
reg i ysenck’s research. Eysenck 
aa ciah two independent repeti- 
be s his original study by Coulter 
D igniewitzky “give results even 
POG striking in their support of our 
Ypothesis.”’ 

The internal consistency and com- 
peel correctness of a particular 
eee data cannot be estab- 
ed by referring to two or even 
aa than two independent studies. 
mh the basis of our evaluation of 
oe s published research, we 
d ve come to the conclusion that his 

ata do not support his hypothesis. 
Whether the two unpublished studies 
of his students confirm or deny 
Eysenck’s hypothesis is an entirely 
separate issue. These studies deserve 

2 Invol i 
Baa of one ae: eae mb 
ee A fe Paper describes a Religionism 
was missing yuck in 1944 (2) complained 

om Ferguson’s 1939 study- 


Pra 1944 paper makes no mention of 
erguson’s 1941 paper. 
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evaluation strictly on their own 
merit. We look forward to the pub- 
lication of these reports.® 

Concerning the alleged superiority 
of Eysenck's factors over Ferguson's 
factors. We pointed out that one may 
more easily understand Eysenck's 
results by referring to Ferguson’s 
“Religionism’” and ‘Humanitarian- 
ism” factors rather than to Eysenck’s 
Rand T. We called attention to the 
fact that Eysenck promises in The 
Psychology of Politics to produce “ex- 
perimental evidence showing that 
Tough-mindedness had correlates in 
other fields, such as, for instance, in 
the field of personality which neither 
Religionism nor Humanitarianism 
possess.” 

We then pointed out that Eysenck 
never again discusses this issue. The 
matter is simply dropped! Eysenck 
replies that he finds our “comment 
difficult to understand. A whole 
chapter, entitled ‘Ideology and Tem- 
perament,’ is given over to a discus- 
sion of the experimental evidence re- 
lating to this problem, and several 
different approaches are reported, all 
of which support the hypothesis that 
tough-mindedness and extraversion 
are related to each other, as required 
by our hypothesis.” 

The only way possible to demon- 
strate the proof he promises of the 
superiority of his factors over Fergu- 
son’s is by a pitting procedure. It is 
not sufficient to present only the cor- 
relations between T and selected 
personality variables. It is necessary 
to produce, in addition, the com- 
parable correlations between Fergu- 
son’s factors and the same personality 
variables. Only then can one make a 
choice between the two alternative 
explanations. This is the test 
Eysenck promises but fails to make. 

There is no reference whatever 1n 


3 A detailed analysis of the deficiencies in 
Coulter's study, as reported by Eysenck, will 
be found in Christie's paper (1). 
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his chapter on Ideology and Temper- 
ament (or in his Figure 30, for that 
matter)* to the Ferguson factors or 
their correlates. Indeed, nowhere 
does he suggest that he has computed 
scores on these dimensions and cor- 
related them with the relevant per- 
sonality variables. We, therefore, 
must reaffirm our earlier statement: 
“Such evidence would indeed be 
instructive. We made a careful 
search of the remainder of The Psy- 
chology of Politics for this promised 
experimental test. Our search was 
in vain. The issue is never again 
raised in the book.” 

A final suggestion. If one is in- 
terested, for the sheer fun of it, in 


* Eysenck (6) asserts that anyone can see 
from his Figure 30 (5, p. 178) that rotating 45° 
to the Fergusonaxes reduces “the correlation of 
extraversion from its present reasonably high 
size,and would leave us with two rather low 
and unimportant correlations with religionism 
(negative) and humanitarianism (negative).” 
Following this ad hoc Suggestion, we made the 
necessary rotation on Figure 30, measured 
the loading of extraversion on the new Reli- 
gionism axis, and found that it is .88 of the 
magnitude of extraversion's loading on the T 
axis. The text (5. p. 179) states that the cor- 
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confirming Eysenck's conclusion that 
fascists and communists are similar 
to each other and different from dem- 
Ocratic subjects, we offer the follow- 
ing recipe. Construct a 20-item scale. 
Include 10 items referring to accept- 
ance of communist ideology (e.g. 
“Communism is the most desirable 
form of government,” etc.). Let the 
remaining 10 items refer to accept- 
ance of fascist ideology (e.g., ‘Fas- 
cism is the most desirable form of 
government,” etc.). Give the ques- 
tionnaire to communists, socialists, 
liberals, conservatives, and fascists. 
Factor items. Emerge with two fac- 
tors. Call one, “radicalism,” the 
other, “tough-mindedness.” Ser 
agreement with communist items an 
disagreement with fascist items as 
“radical.” Discover that communists 
are the most radical, fascists the most 
conservative, and democratic groups 
in between. Score agreement with 
both communist and fascist items “i 
“tough-minded.’’ Discover that pone 
munists and fascists are both “ene 
minded,” because they agree with ras 
of the 20 items. Find demoen 
groups to be “tender-minded, ka 
cause they agree with none of 
items. 

In our opinion, Eysenck is caught 
in precisely this sort of trap. Š 
hope, as a result of this exchange, 


consider a correlation of 41 as “reasonably lear 
high,” but a correlation of .36 as “rather low that others will be able to steer ¢ 
and unimportant.” of it. 
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INTELLIGENCE AND FAMILY SIZE 


ANNE ANASTASI 
Fordham University 


ia past twenty-five years have 
part a a growing interest on the 
ae Pe psychologists, geneticists, 
ship aranhe in the relation- 
famil etween intellectual level and 
ae "Sou _As a result of the nega- 
vars rrelations commonly found be- 
fen shar pls test scores of chil- 
a number of siblings, several 
ell have predicted a drop in the 
Dat ctual level of the population. 
oir have recently become 
nem e from a variety of sources, 
tng ae , raise serious doubts regard- 
E S a conclusion. Careful an- 
re i the problem, moreover, have 
re that it is far more complex 
ical th methodologically and theoret- 

y—than was originally supposed. 


nee GENERAL QUESTION OF 
IFFERENTIAL FERTILITY 


nfo current English demographic 
ok e term “fecundity” signifies 
+ le to produce living offspring, 
era fertility” refers to actual 
ao mance (60). Some investi- 
he have differentiated further 
È kA oe of pregnancies, num- 
aie irths (including stillbirths), 
ae of children born alive, and 
of aoe of children alive at the time 
sons as (71). For practical rea- 
fen however, most studies have 
ceed ee only with the last- 
anid category. To be precise, one 
en also identify 
stepchildren withineeach family, |. 


1 o payiD Hance in 


but not all studies have done so. The 
number of such cases, of course, is 
relatively small and would not ma- 
terially affect the conclusions. 

Mention should also be made of 
the frequently reported “crude birth 
rate,” which is simply the number 
of births during a year, per thousand 
population. Such a figure reflects 
not only mean size of family but also 
other demographic characteristics, 
such as the age distribution, sex 
ratio, and marriage habits of the 
population. Adjusted birth rates are 
sometimes computed, in which one 
or more of these characteristics are 
controlled. On the other hand, when 
average size of sibship is considered, 
as in most studies on fertility and 
intelligence, no account is taken of 
childless families, proportion of un- 
married persons, and differential mor- 
tality rates. It is thus apparent that 
fertility statistics can be variously 
expressed and that their specific im- 
plications need to be carefully scru- 
tinized. 

Most demographers agree that, al- 
though capacity for procreation may 
be influenced by genetic factors and 


by such environmental conditions as 


nutrition and climate, variations in 
fertility among existing human popu- 
lations arise chiefly from social and 


psychological factors (60, 96). Simi- 
lar explanations have been offered for 


-pro ive changes in fertility within 
giyer populations. In the United 
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States and in most European coun- 
tries, the birth rate declined steadily 
from the early or late nineteenth 
century to the 1930’s, reaching its 
lowest point during or immediately 
following the economic depression. 
This decline has usually been attrib- 
uted to socioeconomic factors, such 
as urban migration, the rising stand- 
ard of living, and the increasing ex- 
pense of rearing children (96). Fol- 
lowing World War II, the trend 
was reversed, the crude birth rate for 
these countries rising markedly and 
reaching a peak in the late 1940's. 
Much of this rise is believed to be 
due to the abnormally large number 
of marriages during and immediately 
after World War II, as well as to 
births which had been postponed 
during the depression or early part 
of the war, or which were advanced 
from later years because of the eco- 
nomic prosperity (96). Some demog- 
raphers, however, see evidence of a 
genuine increase in size of completed 
families (96). 
Differential fertility in various sub- 
_ divisions of a population was recorded 
= as early as the seventeenth century 
| in Europe (96). In a survey of British 
demographic writings appearing be- 
tween 1660 and 1760, for example, 
Kuczynski (52) found references to 
the greater fertility of rural as com- 
pared with urban dwellers, and of the 
poor as compared with the wealthy. 
_ The data upon which these early 
_ opinions were based are, of course, 
= meager and difficult to interpret. In 
= more recent times, however, a mass 
of data has accumulated which shows 
an inverse relationship between fam- 
ily size and such variables as income, 
occupational level, and amount of 
education (47, 61, 84, 96). The pre- 
viously mentioned decline in birth 
rate which began in the nineteenth 
century was more rapid in the upper 
than in the lower socioeconomic and 
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educational classes, thus either pro- 
ducing or augmenting such fertility 
differentials. g 
Certain other findings regarding 
fertility differentials are of special 
interest. One of the most persistent 
differentials, occurring even when 
other subgroup differences in family 
size are absent or reversed, is that 
between urban and rural groups: 
This difference usually increases when 
the comparison is between agricul- 
tural and nonagricultural occupa- 
tions. The largest families are thus 
found among those rural residents 
who are engaged in agriculture (96). 
Another suggestive finding, reporte 
in an American survey, is that fam! y 
size tended to be more closely relate 
to the educational level of the wilé 
than to that of the husband i 
Finally, it is noteworthy that a ee 
parison of native whites and Negron 
in the United States reveals that ei 
fertility rates of both groups are a - 
fected in the same way by such fac- 
tors as socioeconomic status, re 
tion, and urban-rural residence ( is 
All of the above socioeconor” 
and educational differences 1” a 
tility have been interpreted as ber 
direct evidence for a relationship ae 
tween fertility and intelligence- low 
far as occupational and educatior 
level are positively correlated W 
intelligence test scores, and 
scores average higher than rura» 
inverse relation of these yart at 
with family size suggests a SiM! a 
inverse relation between inteligen 
and family size. Even prior Te e 
direct study of the latter relat! 
ship, therefore, its implications ji 
been considered by geneticists 
demographers. f 
There a however, a number fr 
noteworthy exceptions to the T B 
tionships cited above. For examp 


" Fa ican, 
several local studies in Amer or- 
Canadian, 


French, German, 
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poan and Swedish cities have 
eom that the negative correlation 
So ae income and family size pre- 
(06) on up to a certain income level 
oe oreover, within the higher 
Saas rr groups, there is sometimes a 
cents ncy for family size to be posi- 
lark y correlated with income. Simi- 
ky xy exoept one can be found to the 
til hesized relation between fer- 
= and occupational level (30). 
ee — mention should be made of 
in Pa oe. survey, conducted 
This a (48, 49, 50, 51, 106, 107). 
onnsa y began with a short ques- 
maik 7 administered to approxi- 
livin y 41,500 native, white couples 
R an Indianapolis, in which the 
Selia as under 45 years of age and 
nanny oe had previous mar- 
in z he major data were gathered 
dbont “fen interview survey of 
iniger ,500 women selected from the 
tern ae because they met cer- 
ivet itional specifications. The 
ai was designed to test a 
cial ms’ of hypotheses regarding so- 
tng Arn aa cep factors affect- 
ae diff ity. Many of the findings 
long cult to interpret, however, 
ere of the complex interrela- 
piles ps and interactions of the vari- 
fea oniaeA (cf. 48, 51). Never- 
ieee one observation is of special 
Althou n the present connection. 
ship nen the usual inverse relation- 
the ee socioeconomic indices 
an ages d was found over a large 
alansa the range, this relationship 
wiht, ‘oe negative to positive 
Such the highest income levels. 
oe e result held in both the total 
inten e and the subsample chosen for 
Tind to study. It was likewise 
ple i wW be true in another subsam- 
including only wives ag 0 to 


` 44 
years, who were more likely to 


uve completed families at the time 
o me survey (50). 
ne of the hypotheses proposed to 
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account for such results is based upon. 
the relation of upward social mobility 
to family size (104). There is some, 
direct evidence in support of this 
hypothesis. Baltzell (6), for example, 
found larger families among residents 
of Philadelphia listed in both the 
1940 Who's Who and the Social Reg- 
ister than among those listed only in 
Who's Who. He argued that the lat- 
ter had more often achieved their 
socially prominent position through 
their own efforts rather than having 
occupied it throughout life. They 
had thus shown more upward mo- 
bility than the Social Register group. 
Other analyses conducted within the 
same study, involving persons wi 
a private school education and those 
whose families had been listed in 
Who's Who since 1900, likewise indi- 
cated greater fertility among the so- 
cially less mobile groups. Attempts. 
to test the hypothesis among inter- 
mediate socioeconomic groups have 
ielded inconsistent results. At such’ 
levels, it is difficult to find groups in 
which the drive for upward social 
mobility can be assumed to be negli- 
gible. For example, persons whose oc- 
cupational levels are higher than 
those of their fathers or who have ad- 
vanced in occupational level within 
their own vocational history have 
been compared with those who failed 
to show such changes (7)- Even the 
“control” group which has mani- 
fested no occupational changes of 
this type, however, may have mani- 
fested upward mobility in other ways. 

It is apparent from various sources 
that, even prior to the postwar “baby 
boom,” generalizations regarding the 
inverse relationship of 
socioeconomic factors needed qualifi- 
cation. Following Worl 
fertility increased relatively 
among those groups in which the pre- 
war level had been low, i-€+ among 
urban residents and persons in higher 
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educational, occupational, and in- 
come levels. Thus the increasing 
birth rate has had the effect of nar- 
rowing fertility differentials within 
the population. These changes are 
vividly illustrated by some of the 
data gathered in recent sample sur- 
veys by the United States Bureau of 
the Census (98, 99). For example, 
when married women between the 
ages of 15 and 44 were considered 
in the 1952 survey, no relationship 
was found between income level and 
total number of children, except for 
a proportionately large number of 
children in the lowest income group 
(under $1,000). When the same 
women were classified according to 
husband’s Occupation, some reversals 
in the usual fertility differentials 
were observed. Thus, professional, 
semiprofessional, and managerial 
groups had Proportionately more 
children than did certain lower occu- 
pational groups, such as clerical and 
sales workers,1 Fertility differentials 
among educational groups have also 
decreased (98). he remaining dif- 
ferences between educational classes 
are still large, however, as are those 
between urban and rural groups. 

A similar Narrowing of fertility dif- 
ferentials has been reported for a 
number of European countries (cf. 
16, 96). For example, there is evi- 
dence of an actual increase in family 
size in the upper social levels of Swe- 
den and Great Britain. Moreover, 
the decrease in fertility differentials 
holds for comparisons between coun- 
tries as well. Thus it is largely in the 


1The fact that significant differences jn 
family size were obtained among occupational 
groups, while no such differences were found 
among income levels within the Corresponding 
range, reflects the low correlation between in- 
come and occupational level. The Tange of 
income within each of the major occupational 
categories is very wide, and the income differ- 
ential between such occupational groups has 
been declining for many years. 
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low-fertility countries that the birth 
rate has increased. 

Mention may also be made of con- 
ditions prevailing within high-fer- 
tility areas. These are identified as 
regions having a crude birth rate in 
the vicinity of 40 per thousand, or 
higher (96). Such birth rates have 
been reported among peoples, of 
Africa, Asia, and Dainai 
among non-European peoples © 
Oceania; and in certain parts of Eu- 
rope. For most of these groups, demo- 
graphic data are very inadequate. 
In general, the birth rate in these 
areas has shown no upward or down- 
ward trend, unlike the birth rate in 
low-fertility areas. Moreover, ser 
usual fertility differentials have not 
been observed in high-fertility areas- 
In Brazil, for example, the urbani 
rural differential is the only one es 
follows the familiar pattern. De 
Ported socioeconomic differentials bi 
that country are slight and not in t E 
expected direction. Thus within E 
type of work, the fertility of empier 
ers and own-account workers is a 
erally higher than that of empoy 
and unpaid family workers Oo wa 
data from high-fertility areas, me 
ever meager, provide another ma ‘A 
exception to the negative correlat H 
between socioeconomic level a 
family size. 


> TEST 
STUDIES OF INTELLIGENCE a 
PERFORMANCE IN RELATIO 
TO FAMILY SIZE 


While much has been inferred 
about the relationship between son 
tility and intelligence from studie ant 
socioeconomic factors, a aien 
proach to this problem has also 
utilized. Current concern with er 
question on the part of demograP ene 
is illustrated by the inclusion of a las 
sion on “methods of research on e 
tions between intelligence an coi 
tility” in the World Population 


this 
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ference held under United Nations 
auspices in Rome in 1954 (97). Simi- 
larly, a special working group was 
called together by Unesco to dis- 
cuss the implications of fertility dif- 
ferentials for the intellectual develop- 
ment of the population. This group 
met in Paris early in 1954 in order to 
draw up a statement which was con- 
tributed by Unesco to the World 
Population Conference (17). 
Investigations on the relationship 
between family size and intelligence 
test score have been conducted in 
Several countries, although the liveli- 
est interest in this problem has been 
manifested in Great Britain. Among 
such studies, the outstanding exam- 
ples are the Scottish surveys (81, 82, 
83). In each of these surveys, an ef- 
fort was made to test a complete age 
sample. In 1932 and again in 1947, a 
45-minute group intelligence test 
including verbal and pictorial ma- 
terial was administered to all 11-year- 
old children in Scotland. The sam- 
Sa actually tested consisted of 
aA 8 and 70,805 children in the 
i 3 and second surveys, respectively. 
i ese samples are described as com- 
pae except for the children whose 
arpon motor handicaps precluded 
E valid administration of this test, 
o ose school children who were absent 
D the day of testing, and a few chil- 
ns = attending certain private schools 
abe whom the necessary background 
a a could not be obtained. In terms 
the total population of 11-year-old 
cottish children, the two samples af- 
Orded approximately the same cov- 
erage. Since the estimated number 
Of 11-year-old children in Scotland 
a 100,300 in 1932 and 80,300 in 
Eo the percentages covered in the 
-WO samples were 87 and 88, respec- 
tively, 
oe with individual tests were 
: ewise conducted on more narrowly 
fined samples. All children born in 
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Scotland on the first day of February, 
May, August, or November, 1926 
were given the 1916 Stanford-Binet 
and a series of eight performance 
tests (63). This group comprised 
874 cases, tested over a three-year 
period at ages ranging from 8-11 to 
11-9. In connection with the 1947 
group test survey, 1,215 children were 
also tested with Form L of the 1937 
Stanford-Binet, as modified for Scot- 
tish children. This group, known as 
the “six-day sample,” included chil- 
dren born in Scotland on the first 
day of the even months in 1936. The 
coverage of the individual test sur- 
veys is more nearly complete than 
that of the group test surveys, special 
efforts having been made to reach 
every child who met the birthday 
specifications. Moreover, the 1,215 
children constituting the “six-day 
sample” of the 1947 survey were stud- 
ied more intensively in terms of 
background characteristics. The 
same children are currently being fol- 
lowed up to determine the effect of 
family size and other factors upon 
their subsequent development (66). 
In the earlier Scottish surveys, no 
data on family size were obtained, al- 
though the results of these surveys 
were employed in the follow-up stud- 
ies to be discussed in a later section 
of this paper. In the 1947 survey, the 
group test? yielded a correlation of 
—.28 with size of sibship, and the 
Stanford-Binet a correlation of —.32. 
Comparisons of mean scores likewise 
showed a consistent and significant 
drop with increase in family size. 
Thus the mean Stanford-Binet IQ 
dropped steadily from 113 for “only” 
children to 91 for children with five 
sibs (82). On the group test, the 


2 Only the verbal part of the group test 
was included in this analysis. The results ob- 
tained with the pictorial part proved to be too 


highly skewed because ©! i 


f low test ceiling; 
hence they have not been analyzed further. 


oe 
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mean decrease in score per unit-in- 
crease in family size was .130 (83). 
The tendency for children from 
larger families to score lower per- 
sisted within occupational classes 
(83), a finding which has also been 
reported by other investigators (14, 
21). All such studies, however, have 
employed fairly broad occupational 
categories. Thus it cannot be as- 
sumed that socioeconomic differences 
have been ruled out when compari- 
sons are made within single occupa- 
tional levels. It is also interesting to 
note that, in the highest occupational 
levels, the negative correlation be- 
tween intelligence and family size 
may disappear or be replaced by a 
positive correlation. Cattell (21) and 
Moshinsky (68) report that the high- 
est negative correlations are found 
in occupations of intermediate so- 
cial status. In this connection, ref- 
erence may again be made to the 
factor of upward social mobility, dis- 
cussed in the Preceding section. It 
is at the intermediate social levels 
that such a factor would be most 
) likely to operate. 
In the 1947 Scottish survey, group 
test scores were available for 974 
twins (83). Analysis of these scores 
corroborated the usual finding that 
twins score lower than singletons, 
the difference in this group corre- 
sponding to approximately 5 IQ 
points. It was further demonstrated 
that such a difference could not be 
attributed to family size or to socio- 
economic level. There was no tend- 
ency for the twins to come from 
larger families, when number of 
births rather than number of chil- 
dren was considered. Such socio- 
economic indices as relative size of 
home and paternal occupation indi- 
cated no inferiority of the twin sam- 
ple. In fact, there was an excess of 
twins in the professional and em- 
ployer groups, and a deficit in the 


unskilled labor group. The implica- 
tions of these findings will be con- 
sidered below, after other pertinent 
data have been examined. 

Another extensive survey is that 
conducted among 6- to 12-year-old 
French school children in 1943-44 
(36, 38). This survey included 
95,237 boys and girls, approximately 
a 2 per cent sample of the elementary 
school population of France. Care 
was taken to obtain a representative 
distribution of cases over the twenty 
regions of France, All subjects were 
given René Gille’s “test Mosaïque, 
a specially constructed pictorial group 
test of intelligence.’ When analyze 
in respect to family size, the mean 
test scores exhibited a consistent 
drop with increasing size of sibship- 
This differential was apparent within 
each year group, the difference in 
mean score between a one-chil 
family and a family of eight or more 
being equivalent to 1 or 2 years O 
mental age. Within occupationa 
groups, the negative relationship be- 
tween intelligence and family siZ 
was clearly apparent among farmers, 
manual laborers, and clerical work- 
ers; it was barely discernible in the 
Managerial group and negligible 1” 
the professional class. a 

A special study was concerne 
with the length of intersibling inter- 
val (36, Ch. 3; 90). Within the tot@ 
sample covered by the French st A 
vey, there were 1,244 two-sibling 
families in which both siblings h@4 
been tested. These were separate’ 
into “long interval” and “short i” 
ternal” sibships, the latter being 
defined as those falling at or belo 


*This test is described and completely 
reproduced in the first report of the Proy 
survey (38, pp. 75-96). Data on reliabi 
and validity are included in the second et 
(36, pp. 21-45). It was called the saose o 
test because it includes a wide variety 
items. 
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the median interval. In the urban 
group the median interval was 24 
months, in the rural 23 months. On 
the intelligence test, the children 
with long intersib intervals obtained 
significantly higher means, these 
differences persisting within each of 
the five occupational categories into 
which the sample was subdivided. 
With long intersib intervals, the 
Scores approximated those of only 
children; with short intersib inter- 
vals, they approximated the scores 
obtained by three-child  sibships. 
PP Scores were also available for 
het gen (36, Ch. 3; 90). In this 
it family size averaged higher 
oo than for the total sample, 
when the twins were counted 
single birth. Analysis of paternal 
ccupation showed a greater propor- 
ea of farmers in the twin sample 
eet in the total sample. With re- 
olen all other occupational classes, 
ae “sing the twin sample was supe- 
isher the total sample. Thus a 
erei percentage of twins had fa- 
Sont A the managerial and profes- 
irib oe and a lower percentage 
Sol n and manual labor cate- 
a than was the case in the total 
lee e. On the whole, then, the find- 
sities ne tee to occupation are 
en to those of the Scottish sur- 
which paternal occupation 
than omha superior for twins 
TI or singletons. 
ea niches of the twins were 
i Same to those of the total 
aa This difference was found 
tans are and remained when 
ae size and occupational level 
ies airon, In the case of fami- 
Saai four or more children, how- 
L Ee twin inferiority was negligi- 
none teil being most pro- 
sib Sone the case of two- and three- 
t ies. Moreover, among the 
Very wi amilies, the twin scores were 
similar to those of short-interval 


sibs, as defined in the earlier part of 
the study. 

Other similar surveys have been 
conducted on a less extensive scale 
in several countries. In England, 
there have been a number of at- 
tempts to test complete age samples 
of children within restricted areas. 
Fraser Roberts and his co-workers 
(77, 79) administered the Otis test 
to all children born during a four- 
year period and living within the 
city of Bath; a subsample was also 
individually tested with the Stan- 
ford-Binet. The Cattell Culture-Free 
Test was used by Cattell (18, 19) 
with all 10-year-old children in Lei- 
cester, a typical industrial city, and 
Devonshire, an “unspoilt rural area.” 
Sutherland and Thomson (89) ap- 
plied a specially developed group in- 
telligence test to all 11-year-old 
school children in the Isle of Wight. 
Burt (14) reports the results obtained 
with the elementary school popula- 
tion of one of the London boroughs 
in the course of his standardization 
of the Binet scale. The Otis test 
scores of 393 10-year-olds in Sheffield 
were analyzed by Bradford (12). All 
of these studies confirm the findings 
of the more extensive surveys regard- 
ing the decline in mean intelligence 
test score with increasing size of sib- 
ship. When correlations were com- 
between test score and family 


he coefficients were uniformly 
d in the .20’s 


puted 
size, t 
negative and clustere 


and low .30’s. 
A similar inverse relation between 


intelligence and family size is re- 
ported by Papavassiliou (72), who 
gave a Greek adaptation of the Stan- 
ford-Binet to 349 children in Athens. 
Heinen (37) found a tendency for 
family size to be inversely related to 
school grades and other indices of 
academic ability in several large 
samples of German school children. 
In another survey of German school 
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children, Eydt (34) reported more 
retardation and behavior problems 
among children from large families. 

Several American studies have 
yielded similar results. Lentz (54) 
obtained a correlation of —.304 be- 
tween family size and intelligence 
test scores among 4,330 urban school 
children in five states, Within sepa- 
rate cities, however, the correlations 
varied from —.065 to —.411. This 
finding is not surprising in view of 
the fact that both the tests and the 
examiners varied from school to 
school. Chapman and Wiggins (24) 
reported a correlation of —.33 be- 
tween National Intelligence Test 
Scores and size of family among 650 
school children, When the correla- 
tion was Tecomputed for 
with native-born and with 
born Parents, it fell to 


man-McNemar intelligence test 
Scores was obtained by Damrin (26) 
in a group of 156 high school girls, 


Hirsch (40) found a consistent de- 
crease in mean Otis IO with increas- 
ing family size in a group of 214 Ten- 
The mean 
dropped from 115.8 for one-child 
families to 96.6 for eight-child fami- 
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lies. Among still larger families, the 
mean varied irregularly, dropping to 
82.0 for twelve-child families. In a 
study of more than 20,000 retarded 
children in the Massachusetts public 
schools, Dayton (27) noted a de- 
crease in mean IQ from 73.2 for one- 
child families to 68.9 for families of 
ten or more children. 

A more thorough analysis of the 
Problem is provided in a study by 
Conrad and Jones (25), who tested 
representative samples of the rural 
Population of three New England 
states. The children were examined 
with the Stanford-Binet, and the 
adults with the Army Alpha. Family 
size was investigated in relation to 
intelligence test score of mother and 
father, education of mother and fa- 
ther, a social status index of the fam- 
ily, and mean intelligence test score 
of the sibship, In uncompleted fam- 
ilies, all of these measures correlated 
negatively with family size, although 
only the Correlations with father’s 
intelligence and with social status 
reached statistical significance. In 
the completed families, on the other 
hand, all correlations were positive, 
but none was Statistically significant. 
Further analyses showed that the 
families of lower intelligence began 
children at an earlier age, 

their total child-bearing 
Period was no longer than in the more 
intelligent families, These conclu- 
Sions, of course, must be restricted to 
the Particular population under in- 
vestigation. 

Similar results were obtained, how- 
svenan a study of California fami- 
lies reported by Willoughby (108). 
Parental ages, the 
author concluded that most of the 
ili Parental 
intelligence was determined from the 
combined score on six verbal and 
five nonverbal tests. Number of 
children correlated —.11 with the 
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test scores of 108 mothers and +.06 
with the scores of 87 fathers, neither 
correlation being statistically signifi- 
cant. 

Another study in which intelli- 
gence test scores of parents were em- 
ployed is that of Willoughby and 
Coogan (109). Follow-up data were 
gathered on 373 persons who had 
graduated from high school 12 years 
earlier and who had taken an intelli- 
Sence test while still in school. No 
relationship was found between the 
Original test scores and age at mar- 
nage, age at birth of first child, or 
number of offspring at the time of 
the survey. It is also interesting to 
note that within the same group a 
low but significant negative correla- 
tion was found between intelligence 
test score and the number of siblings 
of the subjects themselves. More- 
Over, the subsequent education and 
the Occupational level of the subjects 
lips an inverse relation to num- 
der of offspring. Although this study 
1S inconclusive because of the small 
and selected sample and because of 

e incompleteness of the families, it 
nevertheless represents a promising 
approach to the problem.‘ Its chief 
Merits lie in the fact that intelligence 
of Parents was investigated directly 
and in the fact that such intelligence 
was measured prior to educational 
and Vocational differentiation. 

An investigation currently in prog- 
Tess provides further information on 
the relationship between intelligence 
of parents and number of offspring. 

he original data for this study were 
gathered by E. Lowell Kelly as part 
of a broader project involving the 
testing, interviewing, and subse- 


= Cf. critique of this study by Fraser Rob- 

fn (78), but note that one of the criticisms 

a ae upon a confusion between number of 
X pee and number of offspring. 

Fy ersonal communication from Dr. Charles 
+ Westoff, November 4, 1955. Cf. also (67). 


quent 20-year follow-up of 300 en- 
gaged couples living in New England 
(46). The data pertaining to fertility, 
based on 216 couples, are being proc- 
essed by Charles F. Westoff and 
Elliot G. Mishler of the Office of Pop- 
ulation Research, Princeton Uni- 
versity. At the time of the initial 
testing, the mean age of the men was 
26.6 years and that of the women 
24.6, most of the subjects falling 
within the age limits of 21 and 30. 
Since data on number of offspring 
were obtained 20 years later, it can 
be assumed that most of the families 
were completed. Educationally, the 
group represented a superior sample, 
more than half of the men and ap- 
proximately a third of the women 
being college graduates at the time 
of initial testing. The correlation be- 
tween initial Otis Intelligence Test 
scores and subsequent number of 
live births was .19 for the 216 men 
and .17 for the 216 women, both be- 
ing significant at approximately the 
.01 level. These correlations thus 
suggest a slight tendency for the 
brighter parents to have larger fami- 
lies. It is noteworthy that none of 
the studies in which parental intelli- 
gence was correlated with number of 
offspring has so far yielded the nega- 
tive correlation customarily found 
when child intelligence is correlated 
with size of sibship. 2 
The relationship between fertility 
and intelligence has also been investi- 
gated within certain special groups. 
Moshinsky (68) analyzed the test 
scores of 10,159 English children en- 
rolled in different types of schools. 
In the relatively heterogeneous ele- 
mentary school group, a correlation 
of —.23 was found between intelli- 
gence and family size. Within more 
advanced, fee-paying schools, the 
correlations were negligible. Both 
Moshinsky and Blackburn (8) cite 
the latter finding in support of cer- 
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tain hypotheses regarding the rela- 
tionship between intelligence test 
performance and family size. As 
Burt (15) points out, however, such 
a result is to be expected because of 
the greater intellectual homogeneity 
of the children in the more advanced 
schools. 

Hutton (42) reports an insignifi- 
cant correlation of —.05 between IQ 
and number of siblings in the case of 
boys attending a British secondary 
grammar school. The group is de- 
scribed as falling within the upper 
half of the general population in in- 
telligence. A questionnaire survey of 
alumni of the same school indicated 
that the highly selected scholarship 
holders as a group tended to have 
a smaller number of offspring than 
the general population of Great 
Britain. But among the more recent 
graduates the reverse was true, the 
scholars showing a higher replace- 
ment rate than the generality. 

A number of studies on college 
students, in both Great Britain and 
the United States, have yielded very 
low and usually insignificant correla- 
tions between intelligence test scores 
and number of siblings (39, 64, 65,85 
102, 103). This lack of correlation is 
understandable in the light of the 
highly selected nature of such sam- 
ples. Not only are college students 
selected intellectually; they are also 
selected with respect to the constella- 
tion of psychological and socioeco- 
nomic factors which determine who 
goes to college. Some of these factors 
may themselves be directly related 
to family size. 

In their study of California public 
school children with Stanford-Binet 
IQ’s of 140 or higher, Terman and 
his co-workers found a correlation of 
—.271 between IQ and size of sibship 

within 91 completed families (91), 
Through the subsequent follow-ups 
of the entire gifted group, additional 


data have been gathered on the com- 
pleted family size of nearly all of 
the original cases. These data, how- 
ever, have not yet been analyzed. 
The forthcoming fifth volume of 
Genetic Studies of Genius will contain 
extensive data on the relation be- 
tween intelligence and number of 
offspring of the gifted subjects them- 
selves.® 

In connection with a slum clear- 
ance survey, Dawson (28) adminis- 
tered the Stanford-Binet to 1,239 
Glasgow children, aged 3 to 14. Most 
of the children were tested just before 
their families moved to a new hous- 
ing development. The correlation 
between these IQ’s and the number 
of living sibs was —.12; within a sub- 
group of 140 completed families, the 
correlation was —.10, The corre- 
sponding correlations with total num- 
ber of births (including stillbirths) 
were —.20 and —.30, respectively. 
O'Hanlon (71) reports the results of 
retesting 293 of these children eight 
years later. This group yielded 2 
correlation of —.137 between IQ and 
number of living sibs, the correlation 
rising to —.291 in a subgroup of 28 
completed families. The correspond- 
ing correlations with total number © 
births were —.207 and —.413. The 
results of both Dawson and O’Han- 
lon illustrate the fact that differential 
fertility is partly offset by a differ- 
ential death rate during childhood: 

In an early study, Pearson an 
Moul (73) found no significant COr- 
relation between size of sibship a” 
teachers’ ratings of intelligence 
among 1,202 Jewish school childre? 
in London. The children were largelY 
of foreign-born parentage and came 
from relatively homogeneous a” 
low socioeconomic levels, Fertility 
was uniformly high in the group 2° 


ê Personal communication from Dr. Lew!* 
M. Terman, April 28, 1955. 
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a whole. Sutherland (87) reports 
correlations of —.129 and —.126 
between IQ and family size in two 
samples of children of British coal 
miners totaling 3,096 cases. These 
correlations are consistent with the 
findings of other investigators inso- 
far as the relationship is somewhat 
lower in the poorest socioeconomic 
levels than in the intermediate levels. 
In a study of 354 children in a 
Jewish orphanage in New York City, 
Locke and Goldstein (56) found a 
correlation of —.13 between Stan- 
ford-Binet IQ and size of sibship. 
When both mother’s age and order 
of birth were partialled out, this cor- 
relation rose to —.24. It is rather 
difficult, however, to interpret this 
Partial correlation. For compara- 
bility with correlations obtained in 
other studies, moreover, the original 
correlation of —.13 would seem pref- 
erable. 
Sutherland (88) analyzed the 1Q’s 
tei samples of fatherless children. 
e first consisted of 123 12- to 14- 
year-old Glasgow school children 
wipe fathers had died when the 
T pe were under one year of age. 
e second sample comprised 724 
Yorkshire school children aged 11 to 
3 who were also fatherless but not 
necessarily from infancy. Each sam- 
ple was compared with a control 
group of children both of whose par- 
ae were living. The mean IQ of the 
atherless children was lower than 
that of control children with the same 
— of siblings. The correlations 
etween IQ and family size were sig- 
ficantly negative, but lower among 
fatherless children than among the 
controls. The respective correlations 
were —.188 and —.26 in the Edin- 
burgh samples, and —.19 and —.23 
in the Yorkshire samples. 
i In discussing these results, Suther- 
and writes, “The tendency for the 
brighter children to come from the 


smaller families has been reduced by 
the presence of small families whose 
natural increase has been prevented 
by death and whose children would 
normally have had their place among 
the larger families” (88, p. 168). A 
similar explanation may account for 
the lower 1Q’s of the fatherless chil- 
dren within each sibship size. The 
test performance of such children 
resembled that of children in larger 
families, to which they would have 
belonged if family growth had not 
been cut off by the father’s death. 
This type of study represents one 
step toward the analysis of the com- 
plex factors determining the correla- 
tion between intelligence and family 
size. Its findings suggest that it is 
not size of sibship per se but other 
factors associated with family size 
within a given culture which produce 
the obtained differentials in intel- 
lectual level. 


FoLLow-up STUDIES WITH 
INTELLIGENCE TESTS 


On the basis of the negative cor- 
relation between intelligence test 
scores and family size, several writers 
predicted a gradual decline in the in- 
tellectual level of the population (14, 
18, 19, 20, 54, 77, 79, 92, 100). The 
estimated drop varied from about 2 
to 4 1Q points per generation. 
direct test of such a predicted decline 
is provided by follow-up surveys in 
which comparable samples have been 
tested under similar conditions after 
a lapse of several years. Ti.e most 
extensive and best controlled data cn 
this question are to be found in the 
previously described Scottish sur- 
veys (81, 82, 83, 93, 94). Rather 
than showing the predicted decline, 
however, the scores revealed a smal 
but significant improvement from 
the 1932 to the 1947 surveys. 

Other similar follow-ups have been 
reported by Burt (14), Cattell (23), 
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and Emmett (33). In comparison 
to the Scottish surveys, these studies 
covered somewhat shorter intervals 
and used smaller samples of ques- 
tionable comparability. When test- 
ing is conducted within a limited area, 
such as a single county, city, or bor- 
ough, successive samples may differ 
owing to selective migration, dete- 
rioration of neighborhoods, slum clear- 
ance, changes in proportion of chil- 
dren institutionalized or attending 
private schools, and other similar 
factors. Moreover, differences in 
thoroughness of sampling procedures 
in successive surveys may introduce 
differential selection. For example, 
subjects in lower socioeconomic levels 
and those who are physically less fit 
are more likely to be excluded when 
sampling coverage is less adequate. 
Despite certain local variations in 
results, however, these studies also 
failed to substantiate the expected 
decline in test score. 

Reference may likewise be made 
to certain relevant data gathered in- 
cidentally, as by-products from other 
types of testing programs. Thus an 
analysis of the intelligence-test per- 
formance of American soldiers in 
World Wars I and II indicated that 
the level of performance had im- 
proved to such an extent that the 
median score of the later sample cor- 
responded to the 83rd percentile of 
the earlier sample (95). This rise in 
score paralleled an increase in amount 
of education, the mean being at the 
8th grade for the first sample and at 
the 10th grade for the second. It is 
also noteworthy that a survey of the 
intelligence test performance of 
American high school students over 
a 20-year period suggested that this, 
too, had improved, despite the 
marked increase in the proportion of 
students enrolled in high school (35). 
Since a larger proportion of the total 
population was attending high school 

at the end than at the beginning of 
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this 20-year period, a decrease in 
mean score would be expected, unless 
the total population had improved 
sufficiently to counteract such a drop. 

When educational and other en- 
vironmental conditions within a given 
community improve conspicuously, 
larger rises in test scores may be ob- 
served over even shorter intervals. 
Thus a 10-year follow-up conducted 
in a relatively isolated mountainous 
region of eastern Tennessee showed 
a 10-point rise in median IQ (105). 
During this interval, the socioeco- 
nomic and educational conditions of 
this region had greatly improved. 
The second sample tested in this sur- 
vey was closely comparable to the 
first, the children coming largely from 
the same families in the two cases. | 

Similar results are to be found in 
the previously cited slum survey con- 
ducted by Dawson (29). A subgroup 
of 289 4- to 9-year-old children who 
had been tested with the Stanford- 
Binet at the time of their transfer 
from the slum area was retested after 
12 to 18 months of residence in 107 
proved housing. A control group ° 
56 cases who had not been transferre 
from the slum area was also reteste 
over the same interval. The slum 
clearance group showed a significan é 
mean rise of 1.5 IQ points, while Ot 
control group showed no significa”, 
change. To be sure, the mean h 
rise in the clearance group, thogh 
significant, was small. But it shou 
be borne in mind that the retest W@° 
conducted over a relatively shOrt 
period of time. Moreover, it is 2° hé 
worthy that the improvement oft j 
slum clearance group was also ma? 
fested on tests of arithmetic ani 
reading, and persisted when eit 
parisons were made separately E 
boys and girls in all three variaba 
In the control group, on the ot = 
hand, about half of these compat 
sons yielded positive changes 2” 
half negative changes. 
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_ That intelligence test performance 
is susceptible to improvement as a 


_ result of education has been repeat- 


edly demonstrated by a variety of 
methods (4, Ch. 8). Of special inter- 
est are two investigations in which 
persons who had been tested follow- 
ing a uniform period of universal 
education were retested after 10 
years in one case and after 20 years 
$ the other (41, 58). In the interven- 
Ta years, members of these groups 
; received varying amounts of 
urther education. Both studies re- 
poet significant relationships between 
ed of subsequent schooling and 
est performance, when initial scores 

are held constant. 
A of the hypotheses proposed to 
oe for the rise in mean score 
ae in the Scottish surveys and 
ar follow-ups is that of test 
> uifcation, Thus it is argued 
a i subjects tested in the later 
eeu es had the advantage of greater 
capi with psychological tests. 
i rogard to the Scottish surveys 
Bice ves, a special analysis of the 
aaa = obtained by children who had 
a ose who had not been previ- 
P ial tae ae with a similar test 
ee ed little effect of such test ex- 
ome (83, pp. 121-124). In a 
oo more extensive study of the 
ie Anestion; two subgroups within 
ale 7 Scottish sample were com- 
ae 4 One was taken from Educa- 
akik uthorities in which only 2.5% 
a e children were reported to have 
the previous test experience within 
da. of the survey; the other in- 
which Educational Authorities 17 
ee 67% of the children had had 
se experience. The mean rise ne 
pe from 1932 to 1947 in the two 
poe groups was 1.5 and 3.4 points, 
Pectively, both being significant 

at the .01 level.” 
7 Personal communication from Mr. James 


toss h University of Edinburgh, May 17, 


Fundamentally, the question of 
test sophistication and the related 
question of coaching must be an- 
swered in terms of breadth of im- 
provement (cf. 1; 2, pp. 52-56). Any 
influence which is restricted to the 
test performance itself and does not 
correspondingly affect the criterion 
behavior which the test is designed 
to predict would of course reduce the 
validity of the test. But the broad 
social and educational changes which 
have occurred in the time intervals 
under consideration can be expected 
to affect the individual's over-all 
intellectual development, rather than 
being limited to the particular be- 
havior samples covered by specific 
tests. 

Carrying the argument a bit fur- 
ther, some writers have argued that 
“true, innate” intelligence may still 
have declined, despite the rise in in- 
telligence test performance brought 
about by improved environmental 
conditions. Some have gone so far as 
to suggest that if test scores show 
improvement under such conditions, 
then the tests are at fault. This is 
analogous to arguing that the ‘‘true, 
innate” height of the population has 
declined as a result of the negative 
correlation between height and fam- 
ily size, and that the observed rise in 
mean height is illusory. Moreover, 
it could then be argued that we 
should devise a “culture-free’’ meter- 
stick to measure innate height freed 
from environmental influences. Such 
reasoning does not make scientific 
sense. When the psychologist speaks 
of intelligence, he refers to certain 
properties of observable behavior. 
Such behavior is by its very nature 
susceptible to environmental influ- 
ences. The same influences which 
bring about a rise in test score also 
cause an improvement in the quality 
of the individual's general intellectual 
functioning (cf. 31, 75, 100). 

Penrose (74, 75, 76) has argued 
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that, even from a genetic point of 
view, the negative correlation be- 
tween intelligence test scores and size 
of sibship need not imply a decline in 
the intellectual level of the popula- 
tion. Taking into account assortative 
mating and the infertility of low- 
grade mental defectives, he presents 
a simplified genetic model by which 
intellectual level remains constant 
despite the existing fertility differen- 
tials. Essentially, such a situation 
results from the fact that some par- 
ents of average or borderline intelli- 
gence do have above-average chil- 
dren. In a much earlier article by 
Willoughby and Goodrie (110), a 
similar argument is presented. Be- 
ginning with a correlation of = 17 
between intelligence and family size, 
the authors set up a genetic model 
based upon the combined contribu- 
tion of five gene pairs, They then 
trace the effects mathematically 
through five generations, using two 
marital coefficients, .37 and -58. With 
the former coefficient, a decline in 
mean intelligence results. With the 
latter, which is much closer to em- 
Pirically obtained husband-wife cor- 
relations in intelligence test scores, 


there is actually a slight rise in mean 
intelligence, 


METHODOLOGICAL PROBLEMS 


Investigations on the relationship 
between family size and intelligence 
present a number of special meth- 
odological problems with regard to 
sampling and statistical analysis. A 
source of difficulty in several studies 
is the inclusion of incomplete families 
in the sample. This factor may oper- 
ate in a number of different and not 
entirely predictable ways. All chil- 
dren from such incomplete families 
are, of course, classified as coming 
from somewhat smaller families than 
will eventually be the case. This may 
have the effect of classifying lower 
test-scorers (from the ultimately 


ANNE ANASTASI 


larger families) into a category of 
higher test-scorers (from smaller fam- 
ilies). Such a procedure would thus 
reduce the obtained differences in 
test scores between the various fam- 
ily-size categories. On the other 
hand, when a negative correlation is 
found between intelligence and fam- 
ily size in a sample of incomplete 
families, it may simply reflect the 
tendency for persons of lower intelli- 
gence to begin having children ear- 
lier. Should such persons also stop 
having children earlier than the more 
intelligent members of the commu- 
nity, the correlation would disappear 
when completed families are studied. 
In the previously reported investiga- 
tion by Conrad and Jones (25) in 
rural New England communities, 
such was actually the case. i 
It should be noted, of course, tha 
if the less intelligent parents begin 
having children earlier than the more 
intelligent parents, they will con- 
tribute more individuals to the popu- 
lation in the long run, through a 
frequent generations. For examp A 
if each successive generation begin 
to bear children at age 20, there e 
be five generations per century; u 
if child-bearing begins at age, s 
there will be only four generations 
Nevertheless, the effect upon t a 
intellectual level of the pone 
would be less if the duller paren fi 
merely began to have children eatli¢ 
than it would be if the duller also ha 
larger completed families. ‘ch 
Age of parents is a factor whi s: 
needs to be considered in its —— 
right. The smaller families in a eo 
ple may include a relatively fone 
Proportion of incomplete families a 
younger parents. Such parents 4 z 
likely to be better educated than t i- 
older parents of the completed an 
lies, owing to the progressive rise s 
educational level of the population 
The better educated parents yo 
in turn provide a more favorable € 


it 
a 
—— ee eee 


INTELLIGENCE AND FAMILY SIZE 


vironment for their offspring. Thus 
when incomplete families are in- 
cluded in the sample, the differences 
in parental age, coupled with the ris- 
ing level of general education, may 
account at least in part for the nega- 
tive correlation found between fam- 
ily size and intelligence. 
ent second major problem pertains 
t selective factors. Since the correla- 
ea ao question are generally quite 
wae operation of selective fac- 
a “edi aged slight, may produce a 
ti pany spurious result. An ex- 
ie ple of such a subtle selective factor 
a oe in Shuey’s study of students 
Withi merican women’s college (85). 
Hon in the entire group of 2,261 en- 
ee students for whom test scores 
bin available, a significant inverse 
= a was found between test 
at and number of siblings. This 
a ge disappeared, however, 
ols. me records of those whose 
ac sisters had attended the same 
es ee excluded. Presumably 
mente ar lower entrance require- 
Thes ed been applied to those 
Siedler lings had previously ma- 
ae ed. Such students would also 
ikely to come from larger families. 
Pe are more general samples free 
a elective factors. Thus in a 
een ee of children between, Jet us 
ates N ages of 8 and 14, individual 
wia lies will appear repeatedly 
see the sample, the number of re- 
To being directly propor- 
ani to the size of the family. 
chide, containing five school-age 
T i for example, would appear 
eee in the sample. Any chance 
2 itions characterizing individual 
wi, families would thus be spuri- 
st ie in the sample. Nor 
deed ordinary sampling statistics 
s aoe applicable to such samples. 
thee Partly to avoid such difficulties 
wa single-age samples, such as 11- 
ar-olds, have been utilized. 
Even carefully chosen single-age 
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samples, however, are not entirely 
free from selective bias. In such a 
sample, for example, large families 
will be overrepresented, since by 
chance a child of any specified age is 
more likely to be found in a large 
family than in a small one. Hence 
the sample is not representative of 
existing families, nor of the parental 
population. With regard to the child 
population, it can be argued that 
large families should be overrepre- 
sented, since they contribute more 
members to the population. Never- 
theless, certain types of comparisons 
may be distorted through the use of 
a single-age sample. The reports of 
the Scottish survey contain a discus- 
sion of the possible effects of such 
selective biases upon the analysis of 
birth order (82) and maternal age 
(83) in relation to child’s intelligence. 

Parenthetically, it may be added 
that studies on the relation of birth 
order to intellectual and other psy- 
chological characteristics have fre- 
quently yielded ambiguous and in- 
consistent results because of the 
failure to take family size into ac- 
count. Let us suppose that one group 
consists entirely of two-child families, 
while a second group contains only 
four-child families. Obviously, in the 
first group 50 per cent of the children 
will be first-born, while in the second 
there will be only 25 per cent. In the 
Scottish survey (82), the child’s posi- 
tion in his sibship was recorded as 
M, 7%, etc., signifying the first-born in 
a sibship of four, the second-born in 
a sibship of five, and so forth. Com- 
parisons were then made within each í 
family size. Under these conditions, 
the first-born and last-born scored 
higher than the intermediate sibs. 
But even these results are inconclu- 
sive because of the previously men- 
tioned sampling biases. 

The computation of predicted in- 
tellectual decline involves question- 
able assumptions. The usual pro- 
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cedure is based essentially upon the 
comparison of two means obtained 
from the same set of scores (cf., 
e.g., 14, 19). The mean of the sub- 
jects’ scores can of course be com- 
puted directly. The mean of the 
next generation is estimated by mul- 
tiplying the frequency in each class- 
interval of the IQ distribution by 
mean family size in that interval and 
using these new frequencies to find 
the mean. Similarly, the mean score 
of the parents of the present subjects 
is estimated by dividing the original 
frequencies by family size and mul- 
tiplying by two. 

Such a procedure is based upon the 
assumption of a perfect parent-child 
correlation in both intelligence test 
score and family size. In neither case 
is the assumption consistent with 
available information. Parent-child 
correlations in intelligence test scores 
are generally in the .50’s (cf. 4, pp. 
318-320). With regard to fertility, 
parent-child discrepancies may arise 
from various sources, including the 
decrease in fertility differentials 
among socioeconomic classes and the 
social mobility of individuals. Avail- 
able data indicate that when persons 
move from one culture or subculture 
to another, their fertility tends to be 
intermediate between that of the 
group from which they have come 
and that of the group which they 
join. 

Another noteworthy methodologi- 
cal point pertains to the definition of 
mean family size. When size of sib- 
ship is recorded for each child and 
these sizes are averaged, the mean 
family size per child is obtained. On 
the other hand, when number of chil- 
dren in each family is recorded, as in 
a census survey, the average of these 

numbers gives mean family size per 
family. Thus if we have a sample 
of 5 families consisting of 1, 2, 3, 4, 
and 5 children, respectively, the me- 
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dian family has 3 children. But the 
median child, out of the total 15 in 
the sample, belongs to a family of 4. 
The corresponding means are 3 for 
families and 3.67 for children. The 
latter is the contraharmonic mean 
discussed by Jaspen (44, 45) and 
bears a direct arithmetic relation to 
the former. If X represents number 
of children in each family and N 
represents number of families, the 
mean per family is =X/N and the 
mean per child is XX?/SX. In mak- 
ing comparisons between mean fam- 
ily sizes of populations or segments 
of a population, it is obviously essen- 
tial to insure that the same type © 
mean is employed. This point 18 
especially likely to arise when results 
obtained with a child sample are com- 
pared to national census figures, 
since the former are usually expresse 
as mean per child, the latter as mea? 
per family. i 
Attention should likewise be en 
to certain problems dealing with een 
choice and application of psycholog! 
cal tests in studies on fertility a” 
intelligence. An intrinsic feature O° 
the design of the Scottish surveys i 
volved the transmutation of scor A 
from a group test, administered t 
the entire sample, to Stanford-Bine 
10's obtained from a cross section 
the larger sample. It was apparent 7 
felt that the scores of the entire ee 
ple would gain in meaningfulness a 
put in terms of such IQ’s. Apart ook 
the wider familiarity of the IQ, how” 
ever, it is difficult to see what suc i 
conversion accomplishes. Morean a 
a number of specific objections =a 
be raised against this proceda 
Since different intelligence tests Me ra 
ure a somewhat different combiné 
tion of functions, individuals ae 
be expected to retain the same rol- 
tive standing on such tests. f Inte E 
gence tests vary not only in P p 
scale and in normative populatio™ 
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but also in content. The reliability 
and validity of the converted scores 
depend upon the properties of the test 
which was actually administered, 
rather than upon the properties of 
the test into which the scores were 
converted. For example, if an indi- 
vidual test is more precise because 
of its better control of rapport and 
motivation, longer testing time, or 
any other favorable conditions, these 
f aracteriatics are not transferred to 
3 e group test by transmutation of 
cores. Nor is the use of the tradi- 
tional ratio IQ advisable for such 
pana Standard scores, expressed 
eviation IQ's or in some other 
convenient form, have far more to 
recommend them for precise measure- 
ment. 
pe ale the exclusive use of global 
p sures of intelligence in such studies 
F Be questioned. In the light of 
wre plied of factor analysis, it 
Ba appear desirable to investigate 
ae ations with family size, as well 
Ties changes in population 
ee S, in terms of more nearly 
It an ap intellectual functions. 
P te d be of considerable interest 
an pars for example, whether 
a aptitudes are more highly cor- 
a ed with family size than spatial 
he aptitudes, and whether 
rise in mean score found in follow- 
nP surveys is uniformly high in all 
Pects of intelligence. 


ANALYSIS OF CAUSAL RELATIONS 


i Even with similar results, different 
nvestigators have offered varied in- 
terpretations of the correlation be- 
ese intelligence and family size. 
t is especially important in this con- 
nection to differentiate between 
Plausible hypotheses and empirically 
established facts, and to ferret out 
tacit assumptions which may under- 
ie certain interpretations. For ex- 
ample, if intelligence is defined in 


terms of hereditary predispositions, 
it becomes logically impossible to 
investigate whether or not certain 
individual differences in intelligence 
may be related to gene constitution. 
It is more consistent with scientific 
method to employ definitions which 
do not prejudge the data, but which 
permit the maximum freedom of in- 
terpretation. 

We may, of course, define the 
word “intelligence” to mean any- 
thing we choose. All studies on the 
relation of intelligence to family size, 
however, have employed psychologi- 
cal test scores as their index of intel- 
ligence. It is therefore evident that 
our interpretations must take such 
test scores as their point of departure. 
When intelligence is defined in such 
terms, we may recognize three dis- 
tinct etiological mechanisms whereby 
the obtained correlations between 
intelligence and family size might 
result (cf. 3, 59). It is understood, 
of course, that the actual causal rela- 
tions may involve any combination 
of two or all three of these mecha- 
nisms. 

First, there may be inherited struc- 
tural factors (neural, glandular, etc.) 
which serve as constraints, reducing 
adaptability of behavior and limiting 
the sort of intellectual development 
measured by current intelligence 
tests. The less able parents would 
thus transmit their hereditary limita- 
tions to their offspring. The obtained 
correlations would then result from 
the fact that, within a given culture 
or subculture, persons with inferior 
heredity tended to have more off- 
spring. d 

A second explanation attributes 
individual differences in children’s 
abilities to psychological differences 


in the environments provided by par- 


ents of varying intellectual levels. 
In this case, the correlation between 


family size and intelligence of off- 
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spring would again result from a tend- 
ency for the less intelligent parents 
to have more children, but heredity 
would not be involved. Differences in 
intellectual level among the parents, 
as well as among the offspring, could 
thus have resulted from environ- 
mental factors. 

In this connection, mention should 
also be made of the point of view 
represented by Vernon (101). Ina 
discussion of the use of intelligence 
test scores in population studies, he 
urges the need for caution in inter- 
preting such scores because of the 
effect of differences in cultural pat- 
terns upon test performance. Such 
cultural differences, however, may 
affect not only test scores but also 
the individual’s over-all intellectual 
development. The real question per- 
tains not to inadequacies of available 
measuring instruments but rather to 
the interpretation of the existing be- 
havioral differences. 

It may be added that the indi- 
vidual’s intellectual development may 
be influenced, not only by the nature 
and extent of direct intellectual stim- 
ulation offered by his home, but also 
by emotional and motivational fac- 
tors deriving from the “social cli- 
mate” of the home. There is a grow- 
ing body of data pertaining to differ- 
ences in child-rearing practices in 
various socioeconomic levels and 
other subcultures (4, Ch. 23 and p. 
734). Some of these differences, such 
as the degree to which verbalization 
and exploratory behavior are en- 
couraged or discouraged, are likely 
to influence the child’s intellectual 

development. 

A third possible interpretation of 
the obtained correlations between 
intelligence and fertility is based upon 
size of family itself as a causal factor. 
For example, a larger family—at 
least in certain socioeconomic levels— 
would reduce the per capita funds 
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available for education, recreation, 
suitable housing, proper food, medi- 
cal attention, and other environ- 
mental requisites. From a psycho- 
logical viewpoint, another important 
factor is the degree of adult contact 
provided in families of different sizes. 
Available evidence suggests, for ex- 
ample, that such contact may be the 
most important single factor in lin- 
guistic development (4, pp. 335- 
339; 62). And it is well known that 
verbal ability plays a major role both 
in educational progress and in intelli- 
gence test performance. Whatever 
the specific factors involved, the 
causal mechanism under considera- 
tion is independent of the intellectual 
level of parents. A crucial test of this 
hypothesis would thus be found in 
the correlation between size of sibship 
and intelligence of offspring, when 
intellectual level of parents is held 
constant. 

One additional point may be conz 
sidered. Insofar as family size itse 
may be a factor in intellectual de- 
velopment, the general decrease 1" 
family size in the period covered bY 
most follow-up surveys is notewol- 
thy. Quite apart from fertility dif- 
ferentials, such a decrease may have 
operated as one factor in the observe 
rise in intellectual level. 

It is evident that the three i 
potheses differ significantly in bot? 
their theoretical and practical impli- 
cations. On the basis of existing data 
it is impossible to choose among them 
or to determine the relative contribu- 
tions of each of the three types of m 
fluences. The complex interaction © 
many variables makes analysis ° 
causal relations difficult in this area 
A few of the previously cited results, 
however, have a bearing upon caus 
interpretations. Thus Sutherland's 
study of fatherless children | we’ 
gested that family size per se is insu” 
ficient to account for the obtaine 
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correlations between family size and 
Sera (88). On the other hand, 
he findings on twins in both the Scot- 
tish (83) and French (36, 90) surveys 
are consistent with interpretations in 
orms of adult contact. The results 
IE tae and Sutter (36, 90) on inter- 
to ie likewise seem to support 
ee an explanation. The fact that 
ra a of parents have so far failed 
a Sonate a negative correlation 
A eer parental intelligence and 
se er of offspring is also in line 
ith the third hypothesis outlined 
above, 
Pe ce y (101) recommends the 
‘hice of orphans and foster children 
fn ake om parents’ families differ 
dented uch an approach would un- 
a aa y provide crucial data on 
a rst hypothesis, which concerns 
eS sb ana of hereditary factors 
bee Th obtained correlation between 
k elligence and family size. This 
ei aa investigation, however, would 
eb gee icapped by a number of prac- 
= ring such as the limitation 
oe rd and range of available 
ans oreover, family size of or- 
TGR ana foster children would be 
tion cially restricted by such condi- 
a ga death of one or both parents 
and illegitimacy. 
M romped specifically designed to 
Da he role of parental contact in 
ae bet ae of intelligence and 
70) a is reported by Nisbet (69, 
tamed hree types of data were ob- 
saka] on several thousand Aberdeen 
a ol children. First, in the effort 
de bak a the contribution of language 
ent to the correlation be- 
Ween family size and performance 
on the ordinary verbal type of intelli- 
eee: test, the latter correlation was 
r noe with language test scores par- 
ialled out. This partial correlation 
aa only —.04. When family size 
se language test score were Cor 
ated, with intelligence test score 


partialled out, a higher correlation 
(—.11) was obtained. Secondly, ver- 
bal intelligence tests yielded signifi- 
cantly higher correlations with fam- 
ily size than did such nonverbal tests 
as Raven's Progressive Matrices. 
Thirdly, the correlation between in- 
telligence test scores and family size 
rose with age. The latter finding was 
obtained in both a cross-sectional 
study of the large sample and a longi- 
tudinal study of a small sample. 
From the results of these three ap- 
proaches, Nisbet concluded that part 
of the negative correlation between 
family size and intelligence may be 
attributed to the effect of sibship size 
upon verbal development, which in 
turn influences intellectual develop- 
ment. 

In a later study on 288 adult 
women, Scott and Nisbet (80) again 
found that the negative correlation 
between size of sibship and intelli- 
gence test score is lower for nonverbal 
than for verbal tests. Such results 
corroborate Nisbet’s findings on the 
child sample described above, and 
suggest that the environmental fac- 
tors associated with family size may 
affect test performance well into 
adulthood. 

It should be noted that the psy- 
chological effect of family size upon 
intellectual development may be 
curvilinear rather than rectilinear. 
Moreover, the direction in which in- 
creasing family size influences intel- 
lectual development may itself differ 
with concomitant psychological and 
social circumstances. Attention should 
likewise be given to the possible ef- 
fects of family size upon nonintel- 
lectual characteristics, such as s0- 
cialization, cooperation, emotional 
security, leadership, and other aspects 
of interpersonal relations. Available 
research on this question has yielded 
conflicting and ambiguou: 
10, 11, 26, 27, 32, 55, 57, 86). 


206 ANNE ANASTASI 


studies provide evidence that certain 
desirable emotional characteristics, 
as well as acceptance by associates, 
may be positively related to member- 
ship in relatively large families (9, 
11, 27). Most of the results, however, 
are complicated by the correlation 
between indices of social adjustment, 
on the one hand, and both intellectual 
and socioeconomic level, on the other. 
Differences in family size among 
ethnic subgroups within a sample are 
another source of confusion in such 
data. 

In recognition of the methodologi- 
cal and interpretative complexities 
of the problem, there is a growing 
interest in the design of more nearly 
definitive investigations on the rela- 
tionship between intellectual level 
and fertility.® Ideally, such investi- 
gations should begin with the testing 
of young people prior to their educa- 
tional and vocational differentiation, 
i.e., after all have completed a uni- 
form period of required schooling. 
Preferably the test should consist of 

_ a differential aptitude battery yield- 
» Ng a profile of scores rather than a 
The subjects 
until all or 


es are com- 
pleted. Age of both Parents at the 


birth of their first and last child 
should be recorded. Data should also 
be kept on deaths, unmarried per- 
sons, and childless marriages. In- 
formation should likewise be gathered 
regarding occupation, income level, 
and amount of subsequent education 
for each member of the group. It 
would also be of interest to obtain 
indices of social mobility, such as 
changes in occupational, educational, 
or income level within the subject’s 
own life, as well as differences be- 
tween his status and that of his par- 
ents. 

Additional questions can be an- 
swered if test scores of the children 
of these persons are studied; but 
many problems can be solved even 
prior to this step. From a practical 
viewpoint, such a program is not un- 
realistic, especially in nations where 
uniform Psychological tests are ad- 
ministered in the school system and 
where detailed census data are regu- 
larly collected. From a theoretical 
standpoint, this approach would help 
to separate the many interrelated 
variables which are now intricately 
intertwined, and should thus bring us 
closer to a causal interpretation © 
the empirically observed relationship 
between intelligence and family size- 
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In 1924, Charles Dunlap (24) 
stated quite unequivocally that there 
were no organic lesions or histopatho- 
logical findings which could be shown 
to be characteristic of schizophrenia. 
According to him the changes which 
had been reported were not essen- 
tially different from nerve cell 
changes found in nonpsychotic per- 
sons. This rather sweeping conclu- 
sion was based on his comparison of 
the brains of eight schizophrenic 
patients with those of four “control” 
subjects. His control group con- 
sisted of a bootlegger who had been 
shot through the lung and died 
shortly thereafter, two men who died 
of arsenic poisoning, and one woman 
who died of peritonitis. It appears, 
contrary to Dunlap’s statement, that 
there is room to question the “nor- 
mality” of such a group. At any 
rate, this pronouncement apparently 
had an inhibiting effect on further 
investigations into the role of organic 
factors in schizophrenia. Few such 
investigations were undertaken in the 
decade following Dunlap’s authorita- 
tive statement. In 1930, Freeman 
(33) quoted Dunlap's conclusions 
but was willing to report finding a 
deficiency of catalytic iron in the 
nerve cells of schizophrenics, Here 
again seems to be evidence of the 
profound effect of a statement backed 
by an authority figure which blocked 
progress in an area deserving of con- 
tinuing experimental study. 

Perhaps the electroencephalogra- 


1 From the Veterans Administration Hos- 
ital, Palo Alto, California. 
TT wadld ke to thani Maiy, Am Swansii 
and Elizabeth Grisak for their helpful assist- 
ance, 
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phers were unimpressed by authority 
or were not aware of it. At any rate, 
they appear to have been, in Jareg 
part, responsible for rearousing meg 
est in the search for an organic facto 
in schizophrenia. In the process z 
recording the brain waves of yan 
groups of people, they found am £ 
schizophrenics waves resemb Be 
those found in cases of known ie 
damage. Histopathologists and het 
chemists have followed up t ee 
leads so that the recent paychasi 
literature presents a number of P 3 
vocative studies. It is the pa ae 
this review to present some of t ad 
recent findings and to suggest $° 
hypotheses. 

"While it is probably true that mo 
of the work in the physiology, ge” is 
ics, and psychology of paren” gi 
at least indirectly related to the ee ó 
ject of brain dysfunction, the PE s 
encephalographic, histopathologic A 
and biochemical studies are ient 
directly relevant and of suffic T 
importance in themselves to BEES, 
concentrated attention. The ani as 
will include these studies as ole 
several psychological and eet Aa 
cal studies which focus specifically in 
the question of brain disorder in 
schizophrenia. Studies made W! ars 
the last twenty to twenty-five Y° 
will be included. ; 

It is felt that psychologists © 
make a worth-while contribution 
the studies of possible central net ae 
system pathology among SC anic 
Phrenics. Measurement of oF a 
damage by various kinds of ae 
logical tests has been rewarding nto 
extent that tests have been show eti- 
be useful in research and in theo" 


ould 
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a F ansistions about the nature of 
Binns ing in cases of cerebral damage. 
many clinics and hospitals neurol- 
ieee have come to depend on psy- 
s cape tests in the diagnostic 
Bon central nervous system in- 
It is hoped that this re- 
‘Sa te stimulate research interest 
aoe of psychologists who are 
4 with the physiological cor- 
ates of behavior. 


EEG STUDIES 


a Travis and Malamud (76) 
ae a study unfavorable to the 
ens that schizophrenics show 
oP different from the normal. 
aly ene were apparently 
oe interested in this 
iere = since they were more 
eric T in testing some of Berger’s 
fie = ha gy about the na- 
fac Hes EEG. However, they did 
Or fre ifferences in either amplitude 
AEEA of the waves of nine 
Siea phrenics compared with those 
and oon of seventy-five normals 
neh gee ard of stutterers. They were 
of oe the small size of their group 
A lm oni ee and are tentative 
I r conclusions. 

Gibbs the following year, Gibbs, 
spilen and Lennox (34) found that 
TAR as is present about three times 
the among schizophrenics as in 
Ddine neral population. Gibbs et al. 
te a that their experience shows 
they h e majority of schizophrenics 
of th ia tested show abnormalities 
Sines ne They found both fast and 
Race G patterns. Although they 
impre present statistics, they were 
ò 2 ogee by the similarity of many 
thos e records of schizophrenics to 
n dir of psychomotor epilepsy: This 
in Ree should arouse some interest 
7 e possibility of an organic factor 
E ne cases in which an epileptic 
of is found. The meaningfulness 
such a question would, however, 
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depend upon the acceptance of an 
organic basis for psychomotor seiz- 
ures. 

Jasper, Fitzpatrick, and Solomon 
(48) studied the EEG’s of 82 schizo- 
phrenics, so diagnosed by agreement 
of five psychiatrists (no small ac- 
complishment in itself). They also 
included a group of 51 epileptics and 
60 normal subjects. They found, as is 
expected of schizophrenic records by 
now, no specific EEG pattern, but 
rather, a wide range of variability: 
Contrary to the study of Travis and 
Malamud, they found a variety of 
anomalies to a more extreme degree 
and in a greater number among 
schizophrenics than normal subjects. 
Specifically, they report that 23 per 
cent of the cases diagnosed as schizo- 
phrenic showed either clinical or 
electroencephalographic evidence of 
brain activity similar to that char- 
acterizing the epileptic. They are 
somewhat hesitant in their conclu- 
sions about the role of organic factors 
in producing the abnormal EEG's. 
They state, “15% of our group of pa- 
tients diagnosed as schizophrenic 
may well be suffering from ment 
disorders due to head injury, mental 
deficiency, or other organic defects 


of the brain” (48, P- 849). - 
Another study involving 52 chronic 
schizophrenics and 500 control cases 
reports essentially the same results. 
Davis and Davis (19) also feel that 
atients do not have a 
typical “schizophrenic EEG” in the 
same sense as, for example, the spike 
and slow wave of petit mal epilepsy- 
iations in wave pat- 


They found vari 
ide the range of normal 


to much the 
same conc! er ef al., but 
are more cautious in t 
statement. “The quality of the 
atypical variations resembles that of 
the changes which occur in the elec- 
troencephalograms of persons known 


id 
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to have brain lesions or who suffer 
from the various forms of epilepsy, 
or normal persons during sleep, or 
when breathing an inadequate supply 
of oxygen” (19, p. 1019). They per- 
mit the reader to draw his own con- 
clusions about the meaning of their 
findings in regard to possible organic 
factors in some schizophrenics. 

In 1940, Rubin (70) published 
EEG studies of 14 schizophrenics, 2 
manics, and 1 case of traumatic psy- 
chosis. He used the per cent time 
alpha as a measure. By studying the 
distribution of per cent time alpha 
over the various areas of the cortex, 
he felt that he was able to detect the 
presence of atrophy in these cases. 
His diagnosis was confirmed in 10 
of the cases by pneumoencephalo- 
grams. Air studies had not been done 
on the remaining subjects. Rubin 
states not only that there was central 
nervous system pathology present, 
but also that the nature of this 
pathology was atrophy. Rubin's 
results were confirmed by Moore, 
Nathan, Elliott, and Laubach (55). 
In the majority of their sixty cases 
they found evidence of atrophy, en- 
largement of the ventricular system, 
and cisternae. A study of the films 
suggested that there was a selective 
atrophy involving the parietal lobe 
and insula. Unfortunately they do 
not report on the frequency of occur- 
rence. It could be inferred that at 
least 31 of the cases showed such 
atrophy. Again it is difficult to assess 
these findings since the authors used 
no control group and hence could not 
evaluate the frequency of atrophy 
among nonschizophrenics, 

Davis (20) studied the EEG’s of 
132 schizophrenics in an attempt to 
work out a classification system for 
schizophrenic EEG’s. She found that 
she could isolate three general types 
of patterns. Group I showed essen- 
tially no differences from the normal. 
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Group II showed dysrhythmias with- 
out convulsive characteristics. Group 
III contained records which sug- 
gested the possibility of localized or 
generalized irritative lesions, cysts, 
atrophy, or other pathological condi- 
tions. This study again suggests that 
an appreciable number of schizo- 
phrenics show electrical activity © 
the cerebral cortex which has long 
been associated with some form O 
structural brain damage. sad 

Finley and Campbell (31) studie 
the EEG's of 500 schizophrenics and 
250 normals using a triple classifi 
tion similar to that of the previous . 
mentioned investigator. They ed 
gorized the records as normal, SS 
derline, or abnormal. When vee 
classifications were applied to a 
two groups they found 40 per gee 
the schizophrenics had normal nie 
ords while 70 per cent of the one : 
group had normal records. Twen is 
eight per cent of the schizophren nt 
had abnormal records while 7 per cè s. 
of the normals had abnormal sale? s 
The remaining subjects had ie 
in the borderline category. Thus ng 
incidence of abnormal tracings aot the 
schizophrenics was twice that © pell 
normal group. Finley and Camp ae 
found that 30 per cent of the oe 
tients’ records showed tracings in 
dicative of a disorder of functim ey 
the central nervous system. EEG 
Point out, however, that the hizo- 
pattern is not diagnostic of aie 
phrenics since the kinds of WAVE fip 
tivity found are not unique tO 
group. 

Kennard and Levy (50) 
among 100 schizophrenic pa 
exactly the same 60 per cent © Fin- 
normal patterns as reported by for 
ley and Campbell. They looke ma 
correlations between the abnor aa 
patterns and the patients’ chan 
pictures. The incidence of abnor is 
EEG’s was higher for those patte 


found 
tients 
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with early onset and prolonged ill- 
‘Tess, If the illness had lasted three 
years or less, the incidence of ab- 
normal tracings was only 42 per cent 
as compared to 72 per cent if the ill- 
ness had persisted over ten years. 
This relationship also seemed to hold 
in regard to the degree of intellectual 
impairment shown by the Wechsler- 
Bellevue Test. It is suggested that 
these findings are consistent with the 
idea that ‘...the schizophrenic 
Process [can be] thought of as a 
progressive disorder which ultimately 
profoundly affects the performance of 
all organic systems including that of 
the cerebral cortex” (50, p. 421). 
i Grinker and Serota (35) implicate 
he hypothalamus in their EEG 
study of schizophrenia. Their report 
's in the nature of an hypothesis re- 
sulting from the observation of some 
autonomic reaction in schizophrenia. 
There was little electrical activity in 
ie hypothalamus and cortex when 
‘neir patients were subjected to ex- 
ternal cold. They also found that 
electrical stimulation of the hypo- 
thalamus failed to arouse activity in 
that structure or in the cortex. 
Among neurotics and normals an in- 
Crease in activity was apparent. 
They suggest that these findings indi- 
cate a physiological failure at the 
Cortical level because the usual hypo- 
thalamic driving force is lacking. 
he inference could be made from 
these formulations that the thalamo- 
Cortical circuits are not functioning 
so that the cortex is deprived of im- 
ie from lower centers. The Tu- 
ane group headed by Heath (38) has 
also suggested that subcortical cen- 
ters may be implicated in schizo- 
Phrenia. They report spiking activity 
rom areas near the thalamus in the 
Septal region. EEG’s from leads 
implanted by the Horsley-Clarke ap- 
Paratus in subcortical areas of schizo- 
Phrenic patients often showed the 
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spiking activity generally believed 
indicative of structural damage. 
They apparently felt that if they suc- 
ceeded in eliminating the spike by 
administering electric shock through 
the subcortical electrodes, the patient 
had a better chance for recovery than 
if the spike activity continued. 

One other EEG study is of inci- 
dental interest. Turner and Lowinger 
(77) saw a marked difference in prog- 
nosis for recovery of schizophrenics 
given shock treatment, depending on 
the preshock EEG pattern. None of 
the patients with abnormal preshock 
waves was able to leave the hospital 
after treatment. One-fourth of those 
with normal patterns were discharged 
following the treatment. Again, a 
relationship between the presence of 
abnormal wave activity and malig- 
nancy is suggested. 

In summary, it is interesting to 
note the progressive changes in the 
significance ascribed to EEG findings 
in the research on schizophrenia. The 
first studies emphasize lack of con- 
sistency in the tracings. In the later 
studies, greater confidence is placed 
in the idea that EEG studies have 
demonstrated brain dysfunction in an 
appreciable number of cases of schizo- 
phrenia, despite variability in the 
tracings. Finally, some investigators 
have been willing to state the nature 
of the difficulty, i.e., atrophy of the ( 
cortex and/or failure. of thalamo- 
cortical circuits. The histopatholo- 
gists have been much more specific 
in their statements regarding the 
nature of the brain pathology found. 
This larger group of studies will be 


reviewed next. 
HISTOPATHOLOGICAL STUDIES 


One of the persistent proponents 
of organic factors in schizophrenia is 
Eugen Bleuler (12). His monograp 
on the schizophrenias is no doubt still 
the outstanding text in the field. It 
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was his early clinical observations of 
sensory disorders among such pa- 
tients that led him to suggest organic 
dysfunction of the brain. In 1930, 
Bleuler wrote a provocative essay 
(13) in which he tried to sort the 
psychogenic from the physiogenic 
in schizophrenia. It seems worth- 
while to quote part of his conclusions 
in regard to the role of the central 
nervous system in schizophrenia: 


Many cases of stupor, with their general 
prostration of the elementary psychic func- 
tions, conception and train of thought, often 
point clearly to brain pressure and, on autopsy 
tense edema of the pia or brain swelling is 
found. In the various forms of such deliria, 
the fundamental similarity of certain symp- 
toms or of the whole picture to other physio- 
genic conditions, intoxication, fever psychoses, 
epileptic absences, meningitis, encephalitis, 
cannot be denied, and in all such cases we also 
find in the autopsy histological alterations of 
the brain tissue which show some uniformity, 
But in all chronic cases too, decreases in the 
amount of ganglion cells and certain changes 
in the glia, furnish proof that we are in the 
Presence of a brain lesion . , .” (13, p. 206). 


There are serious difficulties in ex- 
perimental histopathology, many of 
which have not been Overcome. For 
instance, as Spielmeyer (74) points 
out, it must be remembered in using 
necropsy material that the diseases 
h cause death may themselves 

bring about changes in the brain. 
Also, it is not known what the agonal 
Process itself may do to brain cells, 
Neither of these variables has been 
thoroughly explored. It might be 
added that although biopsy material 
can now be obtained easily at the 
time of prefrontal lobotomy, control 
material from norma] subjects is not 
so readily procured. [f control ma- 
terial is not obtained, it is impossible 
to assess the extent to which the re- 
sults may be attributed to the experi- 
mental procedure. Many of the stu- 
dies to be reviewed in this section 
have not overcome these difficulties 
altogether. Hence, the degree of con- 
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fidence to be placed in the conclusions 
of these studies is necessarily limited, 
as is also the generalizability of the 
results. 

In 1930, Spielmeyer (74) recog- 
nized these potential sources of error, 
but still felt justified in concluding 
that characteristic histological 
changes were evident in schizo- 
phrenia. The finding he emphasizes 
is the loss of cells in the cortex. ns 
also reports changes in glial cells an 
the presence of destruction products. 
He concludes that there is sufficient 
evidence to state definitely that 
schizophrenia is an organic process. 

In 1932, Bamford and Bean (6) 
Studied a series of cases they deer 
as acute dementia praecox. a 
authors report on three cases i. 
base their conclusions on an unspec 
fied number of additional caer 
Their conclusion is quite sae 
from Spielmeyer’s. In fact, this study 
is one of the few since peat 
completely negative results. 
should = tetbecibeced, of pie 
that negative results are less like i 
to be reported than are positive fin! 
ings.) pa. 
ne is often found in the peychiatiy 
literature, there are a number of ca zs 
Studies reported. Several of me 
will be reviewed even though they a 
not represent well-controlled E 
truly experimental studies. oe. 
perhaps not necessary to state t a 
fruitful hypotheses may be deriv a 
from observation of the single a 

Ferraro (29) reports two cases a 
agnosed dementia praecox. He fou in 
swelling of the oligodendroglia <a 
both cortical and subcortical en 
but enlargement was most pr omie 
in frontal and temporal areas. Ay 
also noted demyelinization, peri¥ zs 
cular infiltration, and the pee 
of gitter cells. These were taken oe 
indications of a degenerative proc i 
not to be expected in the norm 
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brain. Ebaugh, Barnacle, and Neu- 
berger (25) detected histological al- 
terations in fatalities resulting from 
electroshock therapy. They found 
changes in astrocytes and in the 
neuroglia, and suggest that although 
these modifications may have been 
rae to the shock they were not 
imited to areas of strongest current. 
Their conclusions are not explicitly 
Stated, but they imply that shock 
alone could not account for the brain 
damage. Ferraro (30) reports an- 
other case which had been diagnosed 
as schizophrenia for more than ten 
years. The clinical picture was 
clearly that of a schizophrenic psy- 
chosis, but on autopsy a diffuse de- 
myelination of the brain was found. 
The disease was apparently so wide- 
spread and progressive that it could 
scarcely be attributed to a sudden ill- 
ness. Polatin, Eisenstein, and Bar- 
Tera (66) describe two typical cases 
of chronic schizophrenia, similar to 
those of Ferraro. Their discovery of 
slow wave activity in the EEG's was 
the first clue to the presence of or- 
ganic damage; pneumoencephalog- 
raphy revealed cerebral atrophy; and 
the Rorschach suggested organic 
brain disease. Biopsy studies revealed 
what the authors considered to be 
irreversible changes in the cerebral 
cortex. 

Roizan, Moriarity, and Weil (68) 
examine the case of a 34-year-old 
woman who died after a brief acute 
Psychosis diagnosed as catatonic 
schizophrenia. Here, as in Ferraro’s 
research, study of the brain tissue re- 
vealed a demyelinating process 1n 
the central nervous system. These 
authors interpret their findings not 
as indicating that schizophrenia is an 
Organic process but that impaired 
brain function acts as a precipitating 
factor for a schizophrenic reaction. 

Holt and Tedeschi (45) and Van 
der Horst (78) report—as did the 


previous investigators—the presence 
in individual cases of demyeliniza- 
tion, microglial and oligodendroglial 
hyperplasia, loss of cells, and vascular 
infiltration. Additionally, Holt and 
Tedeschi detected cytoplasmic in- 
volvement in the nerve cells them- 
selves along with nuclear disturb- 
ances and lipoid deposits. 

One of the most interesting and 
best controlled studies was made in 
1944 by Kirschbaum and Heilbrunn 
(51). They took biopsies from the 
frontal lobes of eleven chronic schizo- 
phrenics and also from three normal 
subjects. Biopsies from several cats 
and rats provided further control 
material. For the experimental ma- 
terial they report ganglion cell de- 
generation as well as alterations in 
glial cells and blood vessels. These 
changes are similar to those found in 
chronic intoxication and metabolic 
disorders. Comparison of experi- 
mental and control material indicated 
that the results could not be attrib- 
uted to the experimental proce- 
dures. This study seems most im- 
pressive because many of the method- 
ological flaws of other studies were 
d: The tissue was taken from 
the living organism and studied im- 
mediately, thus ruling out the com- 
plicating effects of the agonal process 
as well as the disease which might 
cause death. Also, the use of con 
trols in this study held constant an: 
effects the anesthetic and surgical 
procedures might have had. 

Another well-executed biopsy 
study is reported by Elvidge and 
Reed (27). They used material from 
19 psychotic patients, 13 of whom 
were schizophrenic. Biopsies were 
also obtained from 16 control sub- 
jects, mostly epileptics, and from 14 
laboratory animals. They too report 
for the schizophrenic subjects changes 
in glial cells, with hypertrophy and 
swelling. In some cases the oligoden- 


_ 
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droglia showed nuclear modification. 
These changes were most apparent 
= in the white matter; in some cases 
_ they were generalized, while in other 
cases they appeared more patchy. 
Similar changes were found for those 
= control cases in which the patient 
= showed clouding of consciousness be- 
_ tween epileptic seizures, but not for 
those cases in which the patient re- 
mained clear and oriented between 
seizures. A number of the psychotics 
` were followed over periods of from 
one to two years, and the biopsies 
were repeated. The findings were the 
same as before. It is unfortunate that 
none of the cases had remitted be- 
_ tween biopsies, for if it could have 
_ been demonstrated that the changes 
had slowed down or ceased in recov- 
_ ered cases, it would have been an im- 
` pressive argument for the importance 
of organic factors in schizophrenia. 
Elvidge and Reed suggest that their 
results indicate disturbances or in- 
_ terruptions of commissural pathways 
_ in the brain, and that it is this “dis- 
_ ruption” of impulses to and from var- 
jous parts of the brain which causes 
isturbances in intellectual and emo- 
ional reactions, 
Winkelman and Book (80) made 
a post-mortem study of 10 cases of 
“typical” schizophrenia, Their find- 
_ ings are listed as (a) focal cell loss, 
_ (b) a general decrease in the number 
of nerve cells, especially in the an- 
terior half of the brain, (c) chronic 
cell diseases, including cell shrinkage 
and ghost cells, (d) amount of fat in 
cells usually greater than expected, 
(e) increase in astroglia, and (f) mild 
- general demyelinization in subcortex, 
They claim that the changes found 
in the brains of their patients were 
much more severe quantitatively 
than those usually found in nonpsy- 
_chotic cases. 
| From another post-mortem study 
rome similar results. Rupp and Wil- 
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son (71) studied 37 patients, all under 
fifty years of age. Biopsies revealed 
loss of cells, gliosis, atrophy, edema, 
vascular lesions, and areas of soften- 
ing. The authors do not report data 
for control subjects. : 
Probably the most persistent in- 
vestigators in the histopathology of 
schizophrenia are Papez, and Papez 
and Bateman (57, 58, 59, 60, 61, 62, 
63). Their methods and techniques 
are carefully detailed and appear to 
meet the standards of good research, 
with the exception that the extent to 
which they have studied control ma- 
terial is not always made explicit. 
In 1944, Papez (57) reported finding 
dark staining particles, inclusion 
bodies, in the cytoplasm of nerve 
cells. According to him, these parti- 
cles were not artifacts produced by 
the technique but were living micro- 
organisms of a parasitic nature. In 
1949, Papez and Bateman (59) made 
another study of biopsies. In their 
sample of 50 patients, there were 42 
schizophrenics, 6 manic depressives, 
and 2 syphilitics, They identified 
three stages of cell disease. In the 
first stage, the cells showed intact cy- 
toplasm occupied by a small number 
of the inclusion bodies. Cell nuclei 
showed some deformity. The second 
stage revealed highly vesicular nu- 
clei, enlarged nucleoli, and cytoplasm 
filled with inclusion bodies. The 
third stage was characterized by 
naked nuclei stripped of cytoplasm. 
The inclusion bodies had apparently 
invaded the walls of small blood ves- 
sels, producing degenerative changes: 
The authors studied these inclusion 
bodies under a dark phase microscope 
and concluded that they were dealing 
with a living organism. They were 
able to watch these zooid organisms 
in various phases of the life cycle an 
to keep them alive over a period 2 
many hours in a suspension of cortic@ 
material. Drawings made from mi- 
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croscope slides are presented in the 
article. ; They indeed suggest an om- 
inous picture of parasitic infestation. 
Papez and Bateman propose and 
argue convincingly for the thesis that 
the histopathological changes re- 
ported by other investigators are 
associated with various phases in the 
life cycle of the organism. 

Schaderwald (72) confirmed many 
of the findings of the previous in- 
vestigators, using material from 34 
biopsies taken at the time of pre- 
frontal lobotomy. He states that his 
data is in complete agreement with 
that reported by Papez, and Papez 
and Bateman. However, because of 
the lack of controls, he is much more 
cautious about ascribing pathogno- 
monic significance to the findings. 

The reviewer must agree with 
Weinstein (79) in discussing the stu- 
dies of Papez, and Papez and Bate- 
man: “Merely the unusual nature of 
the claims must not be allowed to 
interfere with open-minded consider- 
ations of the findings.... Many 
questions remain which can be an- 
swered only through carefully de- 
signed experimental work with ade- 
quate attention to controlled obser- 
vations...” (76, p. 549). 

The thalamus and autonomic nerv- 
ous system have also been implicated 
as sites of possible lesions in schizo- 
phrenia. In 1939, Stein and Ziegler 
(78) reported a biometric analysis of 
24 patients and 1 normal subject. 
The brains and thalami in the schizo- 
phrenic patients were smaller and the 
cell count lower than for manic-de- 
pressive patients in general. How- 
ever, none of the differences was 
statistically significant. The investi- 
gators felt that the small size of their 
sample contributed to the negative 
findings. 

For a group of 16 paranoid schizo- 
phrenics, Bateman and Papez (7) 
found inclusion bodies in thalamic 
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cells as well as cell loss as high as 90 
per cent in the association nuclei. On 
this finding they based their proposi- 
tion that hallucinations and delusions 
are the result of abnormal nervous 
discharge of the thalamus to the 
cortex. Since sensory input from 
peripheral sense organs must pass 
through the thalamus, the presence 
of diseased cells there could distort 
the impulses to such an extent that 
the misperceptions characteristic of 
psychoses would be expected. 

Two reports question the impor- 
tance of histopathological findings in 
schizophrenia. In 1949, Rowland 
and Mettler (69) studied 22 schizo- 
phrenics and 1 manic patient. They 
hypothesized that if cell loss is actu- 
ally an important feature in psycho- 
sis, it should be positively correlated 
with severity of illness. However, 
their findings indicated no difference 
in cell count between a group hos- 
pitalized more than twenty-two 
months and a group hospitalized less 
than twenty-two months. They con- 
tend that the various methods of 
counting cells are frequently of ques- 
tionable reliability, and suggest that 
this fact in large measure accounts 
for the conflicting reports of cell loss. 
The same opinion is voiced by Wolf 
They add that per- 
itive results which have 


and on the 
artifacts. T ; 
studies are not sufficient to perm 
accepting the thesis I r 
ganic factor in schizophrenia. This 
position is vigorously opposed by 
Winkleman in a discussion following 
their article. 


In summary, it can be stated tha’ 


a number of different pathologist: 
report similar results. A frequen 
finding is loss of nerve cells. Pneumo 
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encephalographic reports of atrophy 
appear to support this impression. 
Other repeated findings are: the pres- 
ence of a demyelinizing process, de- 

_ generating nerve cells, and patho- 
logical changes in the various glial 
cells. It seems fairly well established 
that the brains of some schizophrenics 
show structural damage, but the sig- 
nificance of the damage is difficult to 
evaluate because comparable studies 
of nonpsychotic subjects are generally 
lacking, 


BIOCHEMICAL STUDIES 


So far, very few biochemical stud- 
ies of brain function in psychosis 
have been made. In 1930, Freeman 
(33), after agreeing with Dunlap that 
there were no histopathological find- 
ings in schizophrenia, reported a de- 
ficiency of catalytic iron in the brain 
cells of schizophrenics, His sample 
included 35 schizophrenics and 16 
patients of other diagnoses. Freeman 

_ suggests that the lack of this catalytic 
agent (which is necessary for the use 
of oxygen) may underlie some of the 
symptoms. An additional inference 
that could be made from these find- 
ings is that the inability of the cells 
to take up oxygen would eventuate in 
their degeneration, 

Doust (23) used two samples of 
schizophrenics, one English and one 
American, totaling 87 patients. He 
studied the brains by means of the 
spectroscope and Photoelectric oxim- 

etry, and found marked reduction 

‘in the effective capillary oxyhemo- 

globin. In view of the extreme sensi- 
tivity of nervous tissue to anoxia and 
the irreversibility of the consequent 
damage, Doust’s results seem quite 
important to the question of central 

‘nervous system involvement in 

schizophrenia. Also, it is interesting 
to note that this investigator reports 

‘a much more profound deficiency in 

capillary oxyhemoglobin among the 
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so-called “process” schizophrenics 
(i.e., classical dementia praecox) than 
for any other group studied. The re- 
viewer feels that this finding has con- 
siderable significance for methods of 
selecting groups of schizophrenics for 
study. This point will be elaborated 
in the discussion to follow. 


PHYSIOLOGICAL STUDIES 


From England, Hoffer, Osmund, 
and Smythies (44) report the acci- 
dental discovery of adrenochrome, or 
pink adrenaline. This substance, 
closely related to adrenaline, is a de- 
terioration product of that hormone. 
Although chemically it is only slightly 
different from adrenaline, its effect 
on the body is quite different. It 
shows toxic effects in much the same 
way as does mescaline. For this rea- 
son, adrenochrome caught the inter 
est of investigators hypothesizing 
toxicity as a basis for schizophrenia. 
(The catatonic subgroup has usually 
been the object of such study, since 
they present many of the symptoms 
generally indicative of toxic disor- 
ders.) Fi 

Hoffer, Osmund, and Smythies 
studied the effects of pink adrenaline 
on a group of volunteer normal sub- 
jects. The results are striking. When 
given adrenochrome, their subjects 
showed behavioral symptoms of schi- 
zophrenia, Furthermore, their sub- 
jective reports confirmed the observ- 
er's impressions that they were de- 
lusional and experienced hallucina- 
tions. EEG’s taken on the subjects 
under the influence of the drug 
showed arrhythmias and epileptic 
Patterns within half an hour from 
the time of administration. The 
authors hypothesize that when adren- 
ochrome enters the cerebral cells it 
inhibits carbohydrate metabolism 
and interferes with cell respiration- 
In schizophrenics, an excess of adren- 
ochrome may be produced, resulting 
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in autointoxication. One might also 
speculate about a possible connection 
between an excessive production of 
adrenochrome and the hypothesis 
that schizophrenics suffer a chronic 
hyperactivity of the sympathetico- 
adrenal system. 

The endocrine system has been 

considered the site of possible path- 
ology in schizophrenia for well over 
half acentury. Every gland from sex 
to pituitary has at one time or an- 
other by various investigators been 
identified as the locus of the diffi- 
culty. The findings have been 
tantalizing but equivocal, and the 
work done in this area would con- 
Stitute a bibliography several times 
larger than the present one. How- 
ever, a few of the studies which ex- 
plicitly link endocrine and autonomic 
nervous system function will be men- 
tioned. 
_ Hoskins (46), in reviewing the 
literature on endocrine function in 
schizophrenics, reports some of the 
work done over a period of twenty 
years at Worcester. His observations 
led to the conclusion that in an ap- 
Preciable number of cases thyroid 
dysfunction is a basic factor. Some 
schizophrenic patients seem to suffer 
from a lack of thyroid secretion; how- 
ever, continued treatment with the 
hormone raised the question of 
whether the target tissue was not at 
fault, That is, since some patients 
did not respond even to massive doses 
of thyroxin, Hoskins concluded that 
the difficulty is not so much in thy- 
roid gland production itself as it is in 
an apparent failure of the appropriate 
receiving or “target” tissue to utilize 
the hormone. 

The other glands which frequently 
Show anomalous reactions are the 
adrenals. The function of the adre- 
nals is relevant to our subject, for 
Pauls there is no direct nervous con- 
rol of the adrenal cortex, its function 
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is intimately related to that of the 
autonomic nervous system. Indi- 
rectly, the secretions of the adrenal 
cortex are influenced by the auto- 
nomic nervous system through the 
hypophysis, which is controlled by 
the hypothalamus. In the past few 
years there has appeared an impres- 
sive number of studies on adrenal 
function in schizophrenia. No doubt 
the research on adrenal activity has 
been accelerated because ACTH and 
cortisone have become available as 
research tools. 

Hoskins (46) suggests that in 
schizophrenia the adrenal-autonomic 
mechanism is somehow disturbed, 
probably ultimately expressing itself 
in altered metabolism. Work on the 
adrenals and the emergency functions 
has been followed up by Hoagland 
and others (17, 39, 40, 41, 42). These 
studies are typical of the many re- 
ported. The primary finding appears 
to be that the schizophrenic exhibits 
a physiologically subnormal response 
to stress. Altschule, Promesel, and 
Parkhurst (3) found that the reac- 
tions of schizophrenics to ACTH in- 
jections are not reduced, so they 
propose that the difficulty lies in 
the hypothalamus. Perhaps ‘this 
vital center fails to activate the pitui- 
tary to secrete sufficient adrenocorti- 
cotropic hormone, resulting in a re- 
duction of the usual adrenal stress 
reaction. Hoagland et al. (39) sup- 
port this point of view. 

After a rather thorough review of 
the literature on the bodily functions 
in psychoses, Altschule (2) concludes 
that most of the disordered physi- 
ology reported is a consequence of 
the psychosis rather than an etiologi- 
cal factor. He makes one exception, 
namely that disturbances in adreno- 


cortical function may cause metabolic 
interfere with 
choses. 


izes-that there has, been 4 


WA n ee 
TRAINING COLLEGE 


220 


general tendency to ignore the physi- 
ology of meatal illness. This tend- 
ency has been reinforced by the con- 
tinued use of the term “functional,” 
which implies, wrongly, a lack of 
physical or biological imbalance. Ac- 
tually, there is no such thing as a 
functional psychosis. Altschule states 
in regard to this point: “Theories 
that ignore cerebral metabolic proc- 
esses and regard psychosis as merely 
a state of mind, deliberately chosen 
by the patient because of environ- 
mental influences, have nothing— 
not even reasonableness—to support 
them” (2, p. 212). 

Paralleling the endocrine studies is 
a series of experiments with various 
drugs which produce “‘model-psy- 
choses” (32). Studies by DeJong (21) 
with mescaline, bulbocapnine, and 
other catalepsy-producing drugs led 
him to hypothesize a “faulty detoxifi- 
cation theory” of catatonia, Accord- 
ing to him, the liver is the organ at 
fault. 

Since DeJong’s studies, numerous 
other drugs have been used in an at- 
tempt to find a toxic agent responsi- 
ble for psychosis. Of these, one 
which has recently received consider- 
able attention is lysergic acid diethyl- 
amide (LSD-25) (22, 32, 43, 47, 52). 
This drug seems to act on the hypo- 
thalamus and in normals, produces 
many of the psychological effects 
characteristic of the schizophrenic 
psychoses. The drug has also been 
administered to schizophrenics, and 
the reported effects vary. In general, 
however, the effect seems to be one of 
exacerbating existing symptoms. For 
normals, these effects are reported: 
visual disturbances, thought disor- 
ders, mood changes, paranoid tend- 
encies, auditory phenomena, etc. 
Physiological changes are also noted, 

These changes are principally those 
associated with autonomic activity. 
This leads to the hypothesis that it is 
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the homeostatic balance which is dis- 
burbed by the drug (67). 

On the negative side, Sloane and 
Doust (73) interpret their results as 
indicating that schizophrenics show 
a diminished or unchanged auto- 
nomic reactivity following adminis- 
tration of LSD-25. Their sample in- 
cluded 19 patients and 14 controls. 
In an attempt to clarify the specific 
action of the drug, Mayer-Gross; 
McAdam, and Walker (52) foun 
that it produced changes in phos- 
phate metabolism. They propose 
that the apparent increased respira- 
tion of brain tissue may account for 
the psychological disturbances PFO- 
duced by the drug. 4 

It is important to note that the 
last studies reviewed are also = 
methodological interest, for when 
disease can be artifically produce ee 
it is an important step in the Oe 
tion of control of that disease. i 
chemist can understand his or 
pound completely, not when he ha 
succeeded in analyzing it but ae 
after he can synthesize it in his la “a 
ratory. He is then in a position © 
control it and predict its action. a 
is evident, of course, that the ae 
tions produced by the administrati?) 
of a drug to normal persons, ean ijy 
at this stage be unequivocal? 
equated with schizophrenic P 
choses. For that matter, as is pomes 
out by Pennes (64), the reactions e 
LSD-25 cannot be entirely due to ! é 
Specific action of the drug; eter 
the wide variability in its ce 
would not have been found. He 70° 
found that not all of his 55 oN 
phrenic subjects reacted to the 
in the same way. Actually, in 50% 5 
cases the drug seemed to ae 
temporary normalizing effect. bly 
most judicious conclusion is proba ult 
that response to the drug is the ee 
of interaction between psycholog" 
and physiological factors. 
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In spite of the large amount of 
work done with various drugs, the 
mechanism of their action is still not 
well understood. Nevertheless, more 
and more of them are being dis- 
covered and tried out. Some, as re- 
ported by the press, have assumed 
the status of wonder drugs. Reser- 
Pine (54) and chlorpromazine are two 
such. Again, the effectiveness of 
these drugs seems primarily ascribed 
to their action on the autonomic 
nervous system (36). 


PSYCHOLOGICAL STUDIES 


ae is rather disconcerting to find 
at schizophrenic patients often re- 
spond to psychological tests in ways 
that we have learned to consider 
characteristic of patients with known 
Organic involvement. This presents 
a difficult problem to the clinician 
who is asked to make a differential 
diagnosis. In the past, test construc- 
tors have been busy trying to show 
how well their tests differentiate the 
current nosological groups. Little 
attention has been paid to the degree 
of overlap, even though the similari- 
ties between two groups on test per- 
formance may be actually more im- 
Pressive than the differences. One 
exception is the report of Hanfmann 
and Kasanin (37) in the discussion of 
their widely known test of conceptual 
thinking. They found a small but 
Significant mean difference between 
groups of known organics and schizo- 
Phrenics, Regarding the response 
Similarity of schizophrenics to Or- 
Sanics, they state, ‘Cases of this type 
often bear a close resemblance to 
Cases of irreversible brain disease, 
except for a greater variability and a 
Certain imaginative quality of their 
Productions which are lacking in or- 
ganic cases” (37, p. 75). It might be 
added that this imaginative quality 
's often very difficult to detect. 
Altrocchi and Rosenberg (1) also 
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found overlap of schizophrenics and 
organics in their work with a new 
block test of conceptual thinking. 
Some of the so-called functional psy- 
chotics could not be differentiated 
from known organic cases, on the 
basis of the test, although others 
could. The hypothesis is immediately 
suggested that among the heterogene- 
ous group of mental ills called “schiz- 
ophrenia,” there is a considerable 
percentage suffering chronic brain 
damage. 

In an attempt to test this hypothe- 
sis, Brackbill and Fine (14) studied 
three groups of patients: a group of 
cases of known organic damage; a 
group of cases of classical or process 
schizophrenia, selected on the basis 
suggested by Kantor, Wallner, and 
Winder (49); and a group of reactive 
or acute cases. Using Piotrowski’s 
(65) Rorschach signs of organic in- 
volvement, they found that the or- 
ganic and process groups could not 
be distinguished from each other, 
while the reactive group showed sig- 
nificantly fewer organic signs than 
did either of the other two groups of 
patients. This study provides addi- 
tional support for the thesis that 
among schizophrenics there is an ap- 
preciable number who suffer central 
nervous system pathology. | 

Another approach illustrative of 
the type of contribution which could 
be made by psychologists is that of 
Meadow and Funkenstein (53). They 
classified a group of 58 schizophrenic 
patients into three subgroups on the 
basis of autonomic reactivity. Group 
A had an autonomic pattern indica- 
tive of some kind of “release” from 
cortical inhibition, suggesting a 
breakdown in thalamo-cortical or 
cortico-thalamic interaction. On 
tests of abstract thinking these pa- 
tients showed the severe impairment 
usually associated with brain dam- 
age. The general behavior of these 


os 
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patients was of the nature which 
typically indicates poor prognosis. 
On the other extreme were the symp- 
tomatically schizophrenic patients of 
group B, who showed no disturbance 
in autonomic reactivity and no dis- 
turbance in abstract thinking, little 
anxiety, and no organized delusions. 
Prognosis was guarded, but was bet- 
ter than for group A. The C group 
showed severe anxiety, a different 
pattern of autonomic balance, and 
the best prognosis. It could be hy- 
pothesized that group A represented 
the classical or typical schizophrenic 
with organic symptoms, while the 
other two groups represented reac- 
tive or situational disturbances. 


HYPOTHESES AND SUGGESTIONS 


The various investigators in this 
area have suggested a number of 
ideas and concepts in regard to the 
meaningfulness of these findings. 
Eickhoff (26) and Anderson (4) pro- 
pose an organic factor in childhood 
schizophrenia. The former sum- 
marizes her observations as follows: 

I am therefore, postulating that schizo- 
phrenia in childhood is an arrest in the devel- 
opment of abstract thought and emotional 
maturity at an infant or toddler level; that 
this arrest is dependent basically upon a de- 
fect in the acquisition of general sensation; 
and this is due either to a defect in the neuro- 
logical systems concerned with pain, touch, 
temperature, position, and vibration sense or 
to faulty stimulation from the outside or 
both; and this defect leads to a delay in the 


formation of the body image and other im- 
ages” (26, p. 234). 


Much the same statement is made 
by Anderson as the result of her ob- 
servation of schizophrenic children; 


This outstanding modification in my con- 
ceptual thinking concerning schizophrenia 
would be in seeing the development of these 
reactions as being determined not by a rather 
ambiguous rejection behavior on the part of 
the mother, or any other significant person, 
but as a failure in interpersonal relations 
brought about primarily by a very specific 
type of organic brain deficit in the child. The 
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exact nature of the physiologic pathology 
would probably be in the associational pata 
ways of the most superficial layers of the 
cortex. The deficit might be of any degree 
from minimal to extensive (4, p. 37). 


Lauretta Bender's (10, 11) long 
years of work with schizophrenic 
children has led her to a biological 
conception of schizophrenia. At 
times she speaks of it as an organic 
brain disease, proposing that there 
is an inherited weakness in the home- 
ostatic mechanism. This weakness re 
autonomic function is such that the 
usual crises of physiological develop- 
ment may be sufficient to precipitate 
the illness. Follow-up studies pi 
schizophrenic children showed tha 
the symptoms closest to the biologia 
were the best predictors of a a 
schizophrenia. This again bole 
the concept of a process. echa J 
phrenia, which starts early, is matig 
nant, and is nonremitting. ee 

All three of these investigators i 
evidence of brain damage even, 1" 
these most early cases of schina 3 
Phrenia. It is interesting that t K 
defect is suspected to be somewher? 
on the input side of the nervous $Y; 
tem, for Bateman and Papez 
also suggest a distortion on the inP a 
side, resulting from cell disease 1n t a 
thalamus. Also, according to Nie = 
(56), there is a kind of schizophrem® 
that is based on a diencephalic me 
of a specific nature. He believes t he 
the disease may eventually reach on 
cortex late in the process, and are 
poses that the defect is probably 3 iA 
born, producing a disorganization 
neuronal patterns. He propost 
call this an “apsychotic sch! 5 
phrenia.” Patients of this group 2°" 
also called “ambulatory one 
Phrenics.”’ They are nomadic, Mê í 
a borderline schizoid adjustmen 
and their contacts with hospital is 
clinic are unproductive of any cha” 
in the pattern of adjustment. 
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Bychowski (15, 16) speaks of 
schizophrenic thinking as similar to 
the kind of organic reactions de- 
scribed by Goldstein. After pointing 
out the concrete character and the 
petenty of thought processes for 
i e two groups, he arrives at the 
oo that there is a dynamic 

ciency of cortical and subcortical 
organization in schizophrenia. He 
old that a certain group of 
schizophrenics suffer an agnosia and 
lea of thought based on cerebral 
eficit. 
P noe (8, 9), after reviewing some 
rt Erpa on schizophrenia, has 
nT ulated what he calls a ‘multiple 
tie a theory.” He maintains that 
fel ness can be the result of a multi- 
p city of etiological factors. In one 
ase it might be the result of liver 
es in another, organic brain 
: ti etc. The symptomatology of 
ia izophrenia may be a sort of 
nal common pathway” for the ex- 
poen of illnesses produced by 
Bon different causes. This formula- 
còi 1s apparently an attempt to en- 
< npass the divergent—and at times 
onfusing—findings in schizophrenia. 
wa such a system, one could think of 
the behavioral symptomatology as 
he phenotype which is the expres- 
ale of a number of different geno- 
ypical factors. 
a is the reviewer's impression that 

e findings reported by these in- 
postigators merit more consideration 
rom psychologists than they have 
so far received. It is probably true 
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EXPERIMENTAL EVALUATIONS OF ROLE PLAYING! 


JOHN H. MANN 
Columbia University 


In recent years a number of articles 
have been written on the use of role 
playing in education, personality as- 
sessment, role training, and psycho- 
therapy. Yet, in spite of the wide- 
spread popularity of role playing, 
comparatively few studies have been 
reported in which role playing was 
evaluated experimentally. 

The present article reviews such 
experimental studies in an attempt 
to establish the degree to which pres- 
ent practice is justified by experi- 
mental evidence, and to indicate the 
problems toward which future ex- 
perimentation might most profitably 
be directed. 

A role-playing situation is here de- 
fined as a situation in which an indi- 
vidual is explicitly asked to take a 
role not normally his own, or if his 
own in a setting not normal for the 
enactment of the role. 

Examples of taking a role not one’s 
own are typically found in assessment 
procedures. For example: 

You are a twelve-year-old boy without 
brothers and sisters. All year long you have 
been looking forward to having your first ex- 
Perience in boy scout camp which runs for two 
months this summer. Camp begins next week. 
You are starting to pack as your father enters 
the room (7, p. 374). 


Taking one’s own role in an un- 
usual setting typically occurs in the 
context of a procedure designed to 
Produce personality change. In such 
situations the individual is asked to 
Portray scenes from his own life be- 
fore a group of other people. The in- 


11 am indebted to Dr. Edgar Borgatta and 
Dr. Leonard Cottrell of the Russell Sage 
Foundation for reading this manuscript and 
Providing many helpful suggestions. 


dividual is acting his own role but the 
setting is unusual. For example: 

We would like to see how you normally 
handle a customer when he comes in your 
store. Suppose Mr. X plays one of your 
customers. He has just asked you the price 
of a certain article. What do you do? 


Studies of role playing can con- 
veniently be grouped in accordance 
with the two definitions considered 
above. That is, they can be consid- 
ered to treat role playing either as a 
method for assessing personality or 
as a method for producing personality 
and behavioral changes. This classi- 
fication will be adopted in the review 


which follows.? 


Roe PLAYING AS AN 
ASSESSMENT PROCEDURE 


Early work in this area was strictly 
qualitative. In 1945, for example, 
Moreno and Moreno (17) used role 
playing to study the social stereo- 
types of children. Other examples, 
summarized by Eaton (9), occurred 
during World War II. At this time 
use was made of role playing in the 
assessment and selection of per- 
sonnel. The studies of the OSS (18) 
and the proposed standardized as- | 
sessment procedures of Bronfenbren- 
ner and Newcomb (7) are outstand 
ing examples of the qualitative wor 
done during this period. Interest in 
quantitative measurement did not 


n a number of studies in 
enters have consciously 
e experimental 
included in 


2 There have bee 
which the experim 
played a role as part of th 
procedure. These studies are not r 
this review for two reasons. First, the experi- 
mental subjects themselves do not role-play. 
Second, role playing is used strictly as an 
experimental technique for fabricating reality, 
and is not itself the object of study. 
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occur in this area until experimenters 


became concerned with the reliability 
and validity of their measuring in- 
struments. 
Reliability 
Probably the first to test the relia- 
bility of ratings of role-playing per- 
formance were Rotter and Wickens 
- (20). They asked observers to rate 
subjects placed in two standard role- 
playing situations on the single char- 
acteristic of “social aggressiveness.” 
During one situation the subject was 
observed by a panel of four observers. 
- During the second situation he was 
observed by a different panel of ob- 
3, servers. There were also three ob- 
servers who watched his behavior in 
both situations. Under these condi- 
tions a mean interrater reliability 
coefficient of .71 was obtained for a 
4, panel observing a single situation 
(N of 48). The correlation between 
the mean ratings of the members of 
_ the two panels on the behavior of the 
48 subjects over two situations was 
‘55, while the mean ratings of the 
three observers who watched the sub- 
jects in both situations had a reli- 
ability coefficient of 17. The .55 
correlation may be interpreted not 
only as rater reliability but as subject 
consistency from situation to situa- 
tion. The larger .77 correlation may 
result in part from a halo effect. 
Stanton and Litwak (22), in a 
study devoted to evolving a test of 
autonomy, asked five judges to re- 
cord the number of times they heard 
examples of “nonideal” behavior on 
11 tape recordings of role-playing 
situations.’ In this way they ob- 
tained a mean reliability coefficient 
of .90 (rho). 
Moldawsky (16), also using tape 


3 Stanton and Litwak equate nonideal be- 
havior with lack of autonomy. This may be 
a dubious equation, but it does not affect the 
Present discussion. 


recordings of role-playing situations, 
was able to obtain an interrater re- 
liability coefficient of .89 for ratings 
of “rigidity” of behavior. . 
Kelly and Fiske (13), interested in 
predicting success in clinical psy- 
chology, asked judges to rate sub- 
jects in role-playing situations ai 
characteristics which were unrelate 
to the immediate role-playing enact- 
ment. Among these characteristics 
were research ability, therapeutic 
ability, diagnostic ability, and a 
all suitability for the profession F 
clinical psychology. Interjudge m 
liability for these characteristics wea 
found to be .45, .44, .40, and .51 re 
spectively. In this study, hove 
not only was the reliability of t" 
observer's judgment being measiee 
but also the degree of similarity, i 
tween his conception of the abiliti , 
required for successful performant 
in research, therapy, and diagno ie 
and that of the other judges. In th 
single case in which the individua 
judges’ ratings were averaged we 
corrected by the Spearman- Brow” 
formula, the reliability for the give 
variable rose from .51 to .76. lia- 
Borgatta (2) has studied the re ie i 
bility of “real” situations and me 
Playing situations. In his study vas 
observation system (Bales Interac 
tion Analysis) was held constant a 
the reliability of the categories 1n E 
System was measured within te 
types of situations. It was found the! 
the kinds of behavior which an indi- 
vidual initiates and receives are ie 
as consistent in role playing as ! 
“actual” behavior, ing 
Almost any system for observ! A 
human interaction which has hee 
devised can be used for observing 
role playing. The reliability ass à 
ciated with such systems is thers 
related to the present discussion, ee 
here we report only the findings, ä 
rived from interaction situatio” 


oy, 
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specifically labeled role playing. In 
summary, it can be said that the 
above studies tend to confirm the 
possibility of establishing reliable 
systems of observing role-playing be- 
havior. 

Validity 

. Studies of validity can be ordered 
in relation to the nearness of the test- 
ing situation to the validating situa- 
tion. The following discussion is 
ordered according to this principle, 
starting with studies in which the 
two are adjacent and extending to 
studies in which the testing and 
validating situations are only tenu- 
ously related. 

Borgatta (2, 3, 4, 5, 6) in a series of 
reports has described the results of a 
study in which subjects alternately 
role-played and behaved “actually” 
in an experimental setting. The sub- 
jects of this experiment were assigned 
to 166 three-man groups and rotated 
in such a way as to ensure that each 
subject would have equal experience 
with an array of persons. 

Perhaps the most striking finding 
of this study in relation to the present 
context was that, though the interac- 
tion profiles varied greatly between 
the role-playing and actual situation, 
the performance of group members 
remained remarkably constant in re- 
lation to each other. Specifically the 
correlation between total rate of in- 
teraction initiated and received and 
rate of positive emotional responses 
initiated and received over the two 
situations was .76, -66, -58, and .58 
respectively. (A correlation of 
was significant at the .05 level.) In 
addition, a close parallelism was 
found between role playing and ac- 
tual behavior over all the Bales Inter- 
action Categories except that of ask- 
ing for suggestions” which was a nu 
variable for the actual behavior. 

Certain differences between role 
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playing and actual behavior were 
noted however. Role playing was 
characterized by lack of tension and 
inhibition, and by higher rates of ask- 
ing for and giving opinion in compari- 
son to the actual behavior in which 
more emphasis was placed on neu- 
trality of feeling and task orienta- 
tion. It is also interesting to note that 
leadership self-ratings made by sub- 
jects, and leadership ratings made 
by their associates appeared to cor- 
relate more highly with their per- 
formances in role playing than with 
their performances in the actual be- 
havior situation. 

In a further analysis of these data 
Borgatta (6) compared role playing 
and actual behavior with behavior on 
a paper-and-pencil projective test 
(Conversation Study). The test con- 
sisted of a leaflet in which ten plates 
with various three-man situations 
were drawn. The subject was asked 
to write down what each of the men 
was saying. This imagined conversa- 
tion was then analyzed by the Bales 
Interaction Categories in the same 
manner as the role playing and the 
actual behavior. From these data a 
correlation of .3 was obtained be- 
tween the total rate of response on 
the projective test and that of either 


the role playing or t 
ior. Role playing and actual be- 


havior, as previously mentioned, were 
highly correlated. 

This finding was similar to one ob- 
tained in an earlier experiment by 
Borgatta (1). In that pilot experi- 
ment the comparison was made be- 
tween a subject’s response to the 
Rosenzweig P-F Study when given 
normally (in written form), when 
used as the basis for a series of role- 
playing incidents, and as the basis 
for an analogous series of apparently 
unplanned real-life incidents. In this 
earlier experiment, in which 78 sub- 


jects were utilized, the findings were 
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not clear because of the low reliability 
associated with the scoring system 
which distributed 24 scores over 11 
categories. 

To clarify further the relationship 
among these three types of situations 
Borgatta (6) subjected his data from 
the later study to a factor analysis. 
Of five relevant factors found, two, 
task ability and emotional assertive- 
ness, showed close parallelism over 
role playing and actual behavior and 
some parallelism over the projective 
test. A third factor, military adjust- 
ment, showed some parallelism over 
role playing and actual behavior, 
with a suggestion of parallelism over 
the projective test. In a fourth factor, 
task supportiveness, parallelism was 
found over role playing and actual 
situation only. Finally in a fifth 
factor, emotional group supportiveness, 
no parallelism was found. 

In his work Borgatta measured 
actual and role-playing behavior 

which occurred in the same experi- 
mental setting. Stanton and Litwak 
(22) expanded the distance between 
these types of behaviors by studying 
the relationship between role playing 
ccurring in an experimental setting 
and actual behavior Occurring in a 
natural setting. They were interested 
in studying autonomy as an aspect 
of stress-tolerance. They used a test 
which consisted of three standard 
stress situations. Raters were asked 
to note the number of nonideal be- 
haviors which occurred during the 
enactment. The validity of the test 
was determined by correlating the 
measure of autonomy so obtained us- 
ing a group of foster parents as sub- 
jects with ratings of autonomy made 
by social caseworkers who had con- 
siderable contact with the foster par- 
ents. A correlation of .82 was ob- 
tained. A correlation of only .48 was 
found between the caseworker’s rat- 
ing and the rating of the homefinder, 
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who interviewed the foster parents 
in their home. In a small class of 
eight students, a correlation of .79 
was found between ratings of auton- 
omy determined from role playing 
and ratings made by close friends. 

It is also interesting to note that 
while raters who saw only role-play- 
ing scenes referring to autonomy were 
able to make ratings that correlated 
-82 with caseworkers’ ratings, raters 
who saw other role-playing scenes 
in addition, were able to make rat- 
ings which correled only .40 with 
caseworkers’ ratings. This suggests 
that the additional information whic 
raters received contaminated, rather 
than clarified, the rater’s perception 
of the role-playing situation. 

Kelly and Fiske (13) attempted a 
use role-playing behavior as a parn 
for making predictions of future a 
cess in clinical psychology. In the! 
large-scale assessment program they 
utilized a battery consisting of stand- 
ardized paper-and-pencil tests, Pra 
jective tests, interviews, written bac 
ground material, and situation 
(role-playing) tests. Two kinds 2 
validation are provided by the 
data. First, one can consider t 
correlation between the censor 
test rating and the over-all rating ° 2 
tained by the use of the whole tes 
battery which, in this case, is th 
validating criterion. It was foun 
that observers seeing only the situa- 
tional tests could make ratings an 
good as those made by other observ- 
ers who read the subject’s bac A 
ground credentials and conducte i 
preliminary interview with him, oa 
who had seen his background 
dentials, his objective and projectlv© 
test results, and had read his ani 
biography. Second, one can consider 
the correlation between the situa 
tional test rating and the criteris 
utilized to establish later successi" 
performance in clinical psychology 
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The correlation between the over-all 
score based on the test battery and 
the validation criterion was .34. The 
correlation between the situational 
ae and the validating criterion was 
me This is not impressive since an 
po elent aimee ion was obtained 
ig using either the credential file or 
a standard intelligence test score. 
; his low correlation may reflect the 
nail that the role-playing situations 
sed were unrelated to the validating 
criterion in content. 
an. summary, it can be said that 
is some evidence which indi- 
es that valid predictions of inter- 
ea behavior can be made from 
laying assessment procedures. 
Seago development of such pro- 
res has, to date, been very slow 
a consequently the problems in- 
olved have hardly been explored. 


Ror PLAYING AS A METHOD FOR 
PRODUCING PERSONALITY 
CHANGE 


4 Role playing has been used to pro- 
uce personality and behavioral 
change in a wide variety of settings 
varying from leadership and teacher 
training to psychotherapy with neu- 
Totics and psychotics. Comparatively 
little experimental work has been re- 
Ported, though application has been 
extensive. 
P Harrow (10) studied the effect of 
ole playing on 20 schizophrenics or- 
ee into two groups that met 25 
a over a two-month period. A 
ird group of schizophrenics was 
Used as a control. Each individual 
Was given the Rorschach, the MAPS, 
and a special role test before and after 
the role-playing series. Although 
Statistically significant results were 
Not obtained, a group of judges look- 
ing at the before-after data found evi- 
dence of the development of more 
Tealistic perception of and interest 
in the outside world as well as an in- 
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creased ability to handle personal 
problems. 

Jones and Peters (12) in a similar 
experiment studied the effect of role 
playing on a group of twelve schizo- 
phrenics and a group of eight lobot- 
omy patients. Another group of 
schizophrenics was used as a control. 
The role-playing sessions were held 
once a week for four months. Before- 
after evaluation was made using the 
Porteus Maze, the Mirror-Tracing 
Test, Gardner Behavior Chart, Ror- 
schach, and the Draw-A-Man Test. 
Again, although statistically signifi- 
cant results were not obtained, it was 
found that all tests except the Draw- 
A-Man Test indicated improvement. 

Sause (21) has investigated the ef- 
fect of role playing on a group of 15 
normal student teachers. At the be- 
ginning of each role-playing session 
the group members selected a prob- 
lem related to student-teacher inter- 
action as the focus of the session. 
They then each wrote down their 
solution to the problem. Following 
this they role-played the problem 
and again wrote down their solution. 
Finally they rated their satisfaction 
with the meeting. Four judges rating 
the written solutions before and after 
a given role-playing session. were 

in finding an 


practically unanimous in 0 
improvement jn the solutions after 


role playing. The judges agreed with 
each other 80 per cent of the time. It 
was also found that the quality of the 
solutions reached before role playing 
improved as the sessions progressed. 
Eighty-seven per cent of the student 
teachers were rated as having better 
solutions before role-playing at the 
twelfth meeting than at the third, 
indication that a transfer of training 
may have occurred. 

Maier (15), also working with nor- 
mal groups, has tested the effect of a 
specialized form of role playing (Mul- 
tiple Role Playing) in helping groups 


- 
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reach satisfactory solutions to com- 
mon problems. Using this technique 
he found that 42 out of 44 groups 
reached a solution, and that only 
four per cent of the group members 
were dissatisfied with the solution 
reached. 

Because of either small sample size, 
design, or lack of underlying relation- 
ship, none of the four preceding stud- 
ies provide statistically significant 
results. In addition, two of them fail 
to utilize a control group. It is best, 
therefore, to consider them as ex- 
ploratory in nature. 

A more satisfactory study from the 
methodological viewpoint was that 
of Janis and King (11). In their de- 
sign they utilized 90 college students 
divided into groups of three. Each 
student gave a short talk to the 
other two on an assigned topic. Each 
student, therefore, spoke once and 

listened twice. The topics mere 
assigned in such a way that the stu- 


persons each, group members 
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a one-way interaction between the 
active speaker and the passive audi- 
ence. The typical role-playing situa- 
tion consists of at least two-way at- 
tive interaction. Second, the opin- 
ions changed were unimportant. 
They dealt with the number of mov! 
theaters that would exist in a 
years, the size of the probable oe 
supply in the following year, and : 
length of time that would be requir 
to find a cure for the common a 4 
It is probable that none of he 
topics was of central importance x 
the students tested. If this had ae 
been the case the students might ha 4 
shown greater resistance to a 
The results obtained by Jane 
King were replicated by RaR ae 
(19) under less artificial conditi e 
Using three standard role-p!ay 
situations with three groups ° 


dent speaking presented a point of 
view contrary to his own as deter- 
mined by a previous opinion ques- 
tionnaire. At the end of each speak- 
ing session (i.e., role-playing session) 
the opinions of the listeners and the 
speaker were reassessed, 

The results indicated that, for two 
of the speaking topics used, there was 
significantly more change of attitude 
in the speaker than in the listener. 
For the third topic the change was 
not significant, but the speakers ex- 
pressed greater confidence in the 
sureness of their opinions than the 
listeners. It was also found that in- 
dividuals who improvised most and 
who were satisfied with their per- 
formance as speakers underwent the 
greatest amount of opinion change. 
There are two difficulties in generaliz- 
ing the results of this study. First, 
speaking on an assigned topic before 
two other people is a very limited 
form of role playing. It involved only 


structed to identify (in- 
Players), and active observers ob- 
structed to watch the action as cia 
jectively as possible). A sP¢ as- 
group of control observers ye 
signed simply to watch the on 
The role-playing situation invo re- 
such topics as labor-management er 
lations, guidance problems, etc. jon” 
each role-playing session a guea 
naire was given to all group mem ob- 
This was followed by a carefully jon 
served nondirective group discar s0 
about the role playing. Using Uan TE 
gathered it was found that the to 
players were most susceptible | 
change. They were also foun io 
the most productive in the discus 3 
after the role playing, to have en 
tain emotional biases of judg™°\, 
and perception, and a tendency rye 
support their own actions in the und 
playing. The identifiers were force 
to be most critical, almost as PFO"... 
tive as the role players during the 


Te 


a 
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cussion, and to have an awareness of 
the dynamics of the interaction proc- 
ess second only to the role player. 
The active observer was found to be 
uninvolved and therefore unemotion- 
al with reference to the enactment. 
He was as critical as the identifier but 
relatively unaware of the dynamics 
of the interaction process. Finally, 
the control observer was found to be 
indifferent, unproductive during the 
discussion, and relatively unaware of 
the interaction dynamics. An inter- 
esting implication of this study is 
that understanding of the dynamics 
of interaction processes is associated 
with involvement in those processes. 
It is also interesting to note that in- 
dividual involvement was found to be 
a function of position assignment. 
While Rosenberg was concerned 
with the differences induced by ex- 
perimental position within the role- 
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success of the role player’s perform- 
ance. He did this by asking his sub- 
jects to reverse roles with each other 
and having them evaluate how well 
the other person succeeded in por- 
traying them. It was found that suc- 
cess was inversely related to the real- 
ism of the role-playing situation. The 
more unusual the situation in which 
the role was portrayed, the better 
was the portrayal. 

In summary, the findings in this 
area are sketchy and essentially sug- 
gestive in nature. The evidence indi- 
cates that role playing may produce 
behavioral and personality change, 
but this has certainly not been estab- 
lished with any degree of confidence. 
The evidence also suggests that the 
effect of role playing on the individual 
is related to his personality and to his 
position in the role-playing situation. 
In addition, the success of the role 

player may be related to the setting 


plaving situation, she did not con- the set 
sider differences between individuals af action iia dhe noleplaging: sitia- 
in the same position. These differ- tion. 


ences have been studied by Luszki 
(14) who compared observers who 
were good at understanding the in- 
teraction dynamics in a role-playing 
scene with those who were not. She 
found that good observers, as dis- 
tinguished from poor ones, (a) were 
good at determining how others 
looked at themselves, (b) were good 
at determining how others looked at 
them, (c) were well adjusted, (d) eval- 
uated themselves as others evaluated 
them, and (e) were consistently and 
favorably perceived by others. Fur- 
ther it was found that persons who 
were good judges of how others felt 
about them were able to identify most 
Successfully with role players. Per- 
sons who were good judges of how 
Others felt about themselves were 
found to be good at observing the 
actions of role players critically. 
of Brown (8) has explored the effect 
the role-playing situation on the 


change, it can i 
is as yet little supportive evidence. 
The studies which do exist suggest 
the possibility of such change and in- 


SUMMARY 
Perhaps the most striking impres- 


sion to be gained from a review of the 
experimental studies of role playing 
is their scarcity. 
studies reported here, effort among 
practitioners of role playing seems to 


have been the 
tion of its various applications. 


Except for the few 


devoted to thë exploita- 


With reference to personality as- 


sessment, there is some sound evi- 
dence for believing that reliable and 
valid role-playing tests can be de- 
veloped. 
still in their infancy, and it is toward 
the development of these tests that 
future effort might profitably be 


directed. 


Such tests are, however, 


reference to personality 


With 
only be said that there 


— 
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dicate a few of the relevant variables 
that may be involved. Much effort 
is needed in the exploration of role 
playing as a method for producing 
change. If the possibility of such 
change were established a number of 
related topics become pertinent. For 
example: What personality charac- 
teristics are most affected by role 
playing? Are all role-playing tech- 
niques equally effective? Is role play- 
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ing less effective in producing change 


if 


it is preplanned? What is the effect 


of the audience on the role players 
These questions, and others whic! 

will readily occur to the reader, are 
open for experimentation. But they 
all rest on the assumption that role 


playing 


can produce personality 


change, and this has not as yet been 


co 


nclusively demonstrated. 
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i Despite the somewhat pessimisti- 
ally worded summary in our last 
a of the W-B? (97), the ava- 
he e of research studies with this 
The ace has abated very little. 
Em iography that accumulated 
ne eae that. Moreover, this fact 
Tee ament to the vitality of the in- 
“ae ent and to the tenacity of its 
diu ps users, especially as a me- 
Redes grach, The studies, again, 
with the entire range of interests, 
ee weakening in some areas and 
Seu a emphasis in others. Thus, the 
Sites short forms” has been nearly 
Steam ed whereas sex differences, is 
Vesti an entirely new topic of in- 
isn ee Generally, more at- 
the ik ave been made to examine 
aume e of intelligence. These 
have ae and previous research 
SE een instrumental in producing 
a larger number of studies character- 


ize Ses 
me PY greater care and sophistica- 


1 
Ea htonen July 1959. 
throu, u abbreviation, W-B, will be used 
Intel out to indicate the Wechsler Bellevue 
subtests al Scale (Form I). The names 0 the 
throu hy also appear in abbreviated form 
D P nE the paper. The single letters 4, C, 
nor , and V stand for the verbal subtests? 
DSE ae Comprehension, Arithmetic, 
tivel, s, Similarities, and Vocabulary, respec- 
OA ya he two-letter combinations + <4, PC, 
subtests! and DS correspond to the following 
Pletion, ( Picture Arrangement, Picture Com- 
git S Object Assembly, Block Design, and 
ull oe ot V IQ, and P IQ stand for 
respectively. erbal IQ, and Performance IQ, 


A review by Watson (127) ap- 
peared in the interval since our last 
review. Our attempt will, again, be 
to make an exhaustive and critical 
examination of the studies in this 
area. Many of these would probably 
not have been published had authors 
and editors heeded the recommenda- 
tions of our previous review. We have 
taken care to prepare a complete 
bibliography so that taken with the 
earlier two reviews (96, 97) the litera- 
ture should be well covered. 

The organization of the present 
review resembles closely that of the 
previous one. The first half deals 
with the W-B as a test of general 
intelligence, whereas the second half 
of the article is devoted to studies 
with sundry classifications of deviant 
and disordered subjects. 


BELLEVUE ASA 


Tue WECHSLER- 
L INTELLIGENCE 


Test OF GENERA 


Studies of Reliability 

Recent interest in practice effects 
and the dependability of subtest 
scores has filled the previously ob- 
served lacunae of reliability data on 
the W-B. Most of the currently re- 
viewed studies present reliability co- 
efficients for the subtests as well as 
for the three IQ scales. 

Test-retest agreement. The two 
studies by Steisel (117, 118) investi- 
gated the effect of practice 
W-B scores. 


matched pairs 0 
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al 
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perior intelligence. The first of each 
pair was retested after 14 days while 
the retest interval for the second of 
each pair was 77 days. Practice 
effects on all three scales appeared for 
both groups. Mean increase in FS 
1Q was eight to nine points. The P 
IQ increases were approximately 
double those for the V IQ. Steisel 
(117) points out that “ . . . ina retest 
situation the examiner should place 
more trust in the verbal scores... .” 
Differences in the retest scores after 
the short and long intervals revealed 
that only the score on the A subtest 
was a function of the recency of ad- 
ministration. The author concludes, 
“The findings in general corroborate 
those of previous studies in indicating 
that the significant gains in retest 
scores are maintained up to approxi- 
mately three months....” A simi- 
lar study, but with mental defectives, 
is reported by Hays and Schneider 
(56). They employed different retest 
periods of two, four, six, and eight 
weeks. The results also suggest that 
the different retest intervals do not 
affect the 1Q increases differentially 
The W-B Form I FS IQ showed an 
increase of 7.6 points due to practice 
as compared with only 4.4 for the 
Form II. 

Split-half reliability. When Webb 
and De Haan (129) reported higher 
split-half reliability coefficients for 
psychotics than for normals, they 
provoked much thought. Helmicik 
(57) criticizes their failure to equalize 
the range of talent for the two groups 
Webb (128) confidently replies that 
Helmick’s suggestion of equalizing 
the ranges of talent would create a 
hybrid sample which was unreal and 
could never be representative of an 
actual population of people. 

Perhaps less interesting but more 
important than these asides are the 
main features of the study. Webb 
and De Haan (129) employed 100 
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subjects to provide reliability coeffi- 
cients for the subtests. All the sub- 
test coefficients were statistically 
significantly different from zero for 
the paranoid subjects, while only PA 
failed to show significance with the 
normal half of the sample. The V 
subtest was the most reliable for both 
samples. Only two subtests showed 
much difference in reliability for the 
different subject sampling. These 
two subtests, PA and PC, were more 
reliable for the psychotics. Botwinick 
(14) utilized Webb and De Haan's 
normal group, a group of 50 older nor- 
mals, and a group of 31 patients with 
mental disorders of the senium. Un- 
fortunately sex could not be held con- 
stant for all three groups. Correcte! 
split-half coefficients were calculated, 
and only C failed to differ signifi- 
cantly from zero. The author's find- 
ings were characteristic for this type 
of study. The coefficients were large" 
for the older subjects and were still 
greater for the senile disordere 
group. As the subtest variability 1" 
creases due to aging or menta 
changes associated with illness, the 
reliability coefficients seem to 1° 
crease in magnitude. The reliability 
of some subtests is not as high 4° 
might be desired. Thus, it is not SU" 
prising that the authors (14, 129) are 
skeptical of employing diagnosti¢ 
pattern analyses that are based up" 
the more unreliable of the subtests 
Summary. The additional research 
covered in the current section PFO” 
vides sufficient information to gui@? 
the clinician. The studies have bee” 
well designed and are fairly concl¥” 
sive. The IQ scales seem reliable, 2” 
though subject to practice effect 
The practice-effect increment doe” 
not depend greatly upon the S7 
of the test-retest interval provi e 
the retesting occurs within three 
months. Practice effects are abots 
twice as great on the performan 
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section. Split-half reliability coeffi- 
cients for the various subtests show 
quite a range of values. Some of the 
smaller coefficients cast doubt upon 
the usefulness of diagnostic pattern 
analysis which is based upon the more 
unreliable subtests. Additional work 
in the area of reliability should be 
well designed and comprehensive. 


Correlation With Other Tests 


Other Wechsler scales. The W-B is 
compared with the Form II in a com- 
prehensive study by Gerboth (45). 
Retest results disclosed that when 
Form II was administered first there 
was a significant increase in the subse- 
quent W-B scales scores. The scores 
were uniformly higher by about five 
Points for all scales. Surprisingly, 
Hays and Schneider (56) found this 
same elevation of W-B scores when 
Mey preadministered the Form II. 

his is surprising, because instead of 
fel ae, superiors they employed men- 
be defectives. For some reason pre- 

eta the Form I does not 
hla the subsequently obtained 
en II scores nearly as much. 
atisfactory explanation of this spe- 
cial effect has not been forthcoming. 
Returning to Gerboth’s study 
5, the intercorrelations between 
e ee suggest that the I, C, A, 
oh PA subtests were least compa- 
TI S petween the W-B and the Form 
a he correlation between FS IQ's 
ap about .75. This is fairly high 
ee the range of talent was only half 
at for an unrestricted sample. 

Another Wechsler scale, The Wechs- 
er Intelligence Scale for Children 
{WISC), was compared with the W-B 

y Delattre and Cole (31). With a 
= range of intelligence they found 
ie of .87 between the 
i s. The mean FS IQ was six points 
oher for the WISC. Intercorrela- 
t on of corresponding subtests of the 
Wo tests provides indices of compa- 


rability. PA was particularly low with 
an r of only .19 between the two tests. 
The rather moderate correlations pro- 
voke the authors to wisely invoke 
caution in applying W-B patterns 
and signs to WISC data. An almost 
identical design with a narrower 
range of intelligence is found in a 
more recent study by Knopf, Mur- 
fett, and Milstein (68). Their results 
provide good confirmation of the 
earlier findings of Delattre and Cole 
(31). They also emphasize the lack 
of comparability between corre- 
sponding subtests of the two tests. 
Vanderhost, Sloan, and Bensberg 
(124) report on the gross compara- 
bility of the W-B and the WISC for 
a mental defective group. Their ex- 
perimental design neatly counter- 
balances extraneous factors. With a 
restricted range of IQ they report a 
correlation of .72 between FS IQ's. 
They found the mean WISC to be 
1.6 IQ points higher than the W-B, 
a difference not statistically signifi- 
cant. 
Other intelligence tests. Alderdice 
and Butler (3) found a correlation of 
69 between the W-B FS and the 
Revised Stanford Binet (S-B), Form 
M. However, the correlation is some- 
what attenuated by the narrow range 
of IQ's in the sample. Correlations 
between the W-B and the S-B are 
considerably higher when an appro- 
priate range of talent is employed 
(96, 97). As encountered in the past, 
the W-B IQ was found to be some- 
what higher than that for the S-B (3). 
Duncan (37) presents à table for con- 
verting W-B vocabulary scores t 
S-B vocabulary scores, thereby elimi- 
nating the necessity of administering 
both vocabularies, when both tests 
are included ina battery, eati 
The Army General Classification 
Test (AGCT) was compared with the 
W-B by Tamminen (121). A correla- 
tion of .91 was obtained between the 
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two tests when the sample was cor- 
rected for range of talent. The AGCT 
appears to be more closely related to 
the V IQ than to the P IQ of the W- 
B. The American Council on Educa- 
tion Test (ACE) correlated .61 with 
the W-B FS with a restricted range 
of IQ’s in the sampling by Merrill and 
Heathers (86). Gerboth (45) reports 
a comparable correlation between 
the ACE and the W-B but also em- 
ployed a narrow range of talent. A 
thesis by Smith (109) reports the 
relationship between the SRA Pri- 
mary Mental Abilities Test but was 
not available for review. 

Knott et al. (69) compared the W-B 
with the seven-test Kent Battery. 
The correlation with the FS IQ was 
-87, but this is difficult to interpret 
since the range of talent of the sample 
was not reported. Delp (32) com- 
pared only the EGY of the Kent Bat- 
tery with the W-B. The correlation 
was .65, and the range of talent was 

j only slightly reduced. Raven’s Pro- 
gressive Matrices correlated .55 with 
a short form of the W-B according to 
Levine and Iscoe (71). They state, 
“The Progressive Matrices appeared 
to tap areas of intelligence most 
closely related to BD and not signifi- 
cantly related to C.” In another 
study Levine and Iscoe (72) found 
the correlation between the W-B and 
the Matrices ‘‘... not high enough 
for individual prediction... ,” but 
this was with a deaf sample. Desai 
(33) compared the Matrices against 
just the V IQ of the W-B. He ob- 
tained an 7 of .65 which was corrected 
for attenuation. Allen, Thornton, 
and Stenger (5) found the surpris- 
ingly high correlation of .86 between 
the Ammons Picture Vocabulary and 
the W-B. Rubin (101) presents the 
results of his comparison of the W-B 
and quantitative features of the 
H-T-P. With an appropriate range 


of talent he obtained a correlation of 
-67. The mean IQ’s were rather com- 
parable. 

Other measures. Storrs (119) re- 
ports interesting relationships be- 
tween the W-B scales and the Gen- 
eral Aptitude Test Battery (GATB)-. 
The Battery showed the interest, 
verbal, and numerical factors to be 
related to the VIQ. The test-factors 
related to P IQ were spatial, form 
perception, clinical perception, am- 
ing, and motor speed. A correlation 
of .58 was found for the relationship 
between the total score for the Bat- 
tery and the W-B FS. The correla- 
tion with the P section was slightly 
higher than with the FS. 

Interest in the correspondence be- 
tween W-B and intellectual aspects 
of Rorschach performance is en- 
countered in the work of Abrams (1). 
A multiple-regression equation is pre- 
sented which predicts IQ score. The 
Rorschach elements involved are 
F+%, M, W, and R. The multiple 
R is .53. 

Only two studies compared W-B 
scores with criterja other than tests. 
Gerboth (45) found that both forms 
of the W-B were significantly relate 
to grade-point average in school. The 
correlation for Form I was .29, while 
that for Form II was .26. These 7'S 
are based upon a narrow range © 
talent. Although Merrill and Heath- 
ers (86) also sampled intelligence nar- 
rowly they obtained a correlation © 
-58 between W-B V IQ and grade-point 
average. 

Summary. As time passes the W-B 
and its later forms seem to attain a” 
increasingly venerable position. 
is now the well-accepted standard 0 
intelligence evaluation for adults. 4% 
many tests of varied composition ar® 
correlated with the W-B it becomes 
clear that the W-B occupies a centra 
position in evaluating the factor © 
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general intelligence. When correc- 
tions are made for the ever-present 
restricted ranges of talent, the cor- 
relations between the W-B and other 
tests argue well for both the reliabil- 
ity and validity of the test. 

The principles of test construction 
employed with the W-B have been so 
successful that an alternate form 
tg and a children’s scale (133) 
Sia been developed. These two 
a ms of the test seem to be compara- 

le to the W-B in their capacity to 
eae general intelligence. How- 
ee the extension of W-B-derived 
T est pattern interpretations to 

ese tests has not been too well sup- 
Ported. 
ate authors once again bemoan the 
ia re of investigators to report the 

nge of intelligence of the samples. 
ie especially distressed that so 
Bee. the writers correct their coeffi- 
a s for range-of-talent. Too often 
ied correlation is meaningless as 
the a in the report. Too often 
aa ests in the file determine the 
Bad ere than the nature of the 
fee etermining the data to be col- 
(31) i The sampling of one study 
A aN exemplary, while still others 
meee much thought behind the ma- 
a as of experimental variables 
56) e statistical analyses (3, 31, 45, 


Short Forms 


Surprisingly, only three papers (3, 
bin. 53) propose new short-form com- 
aay of W-B subtests. First of 
= se is Gurvitz's (53) thorough sift- 

8 of all possible two-subtest com- 
eas With a large sample the 
u S proved best and was confirmed 
of on cross validation. The second 
these proposals is the practical one 
siy etzin and Gallagher (29). They 

D. that the four-test form C-S-PA- 

correlates .94 with the FS for a 


cross-validation group of defectives. 
Both the V IQ and P IQ can be esti- 
mated from this form, which is felt 
to be of particular importance when 
working with mental defectives. 
Their regression equations provide 
appropriate raw scale conversions to 
IQ. Last is the proposal by Alderdice 
and Butler (3), who developed the 
J-D-S-OA combination so that S-B 
scores could be best predicted for 
mental defectives. 

Previously proposed short forms 
are evaluated in an additional three 
papers (58, 59, 75). McKenzie (75) 
studied the V-I-S-BD form proposed 
by Kriegmen and Hansen. He con- 
cludes this form is a good means of 
estimating the intelligence of mental 
defectives and that it provides more 
diagnostic information than similar 
instruments. In the same year both 
Herring (58) and Hilden (59) re- 
ported thorough empirical investiga- 
tions of previously proposed forms. 
Herring studied the short forms with 
both mental-hygiene-clinic patients 
and normals while Hilden used ou 

he 


more extensive study by Herring (58) 
four, 
and five subtest combinations. 
Summary. This review ¢ 


with short forms as app' 
previous review (97)- Instead of a 
deluge of brief unrelated studies it is 
gratifying to encounter well-planned 
comprehensive studies (3, 53, 58, 59). | 
Two of the reports (58, 59) involved 
considerable effort directed at bring- 
ing about an integration of work in 
the area of short forms. One of the 
proposed short forms (29) is the first 
to yield an estimate of both V IQ and 
P IQ for use with defectives. Sam- 
pling problems such as range of talent 
and cross validation have also re- 
ceived more consideration than for- 


A 
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merly. This greatly increased sophis- 
tication is most encouraging. 


Applications With Special Population 


Sex differences. A comprehensive 
study by Norman (89) explores sex 
differences in young superiors. Many 
significant differences in subtests as 
well as for the V and P 1Q’s were en- 
countered. In general, men have 
higher scores than women, and the 
greatest difference is apparent on A, 
This relatively large sex discrepancy 
in A was encountered in college stu- 
dents by Guertin (51) and proved 
puzzling. Vane and Eisen (125) found 
that both delinquent and normal girls 
had lower subscores on A and D than 
the standardization population of 
Wechsler. Strange and Palmer (120) 
employed psychiatric clinic patients 
for their study of sex differences, 
Their results agree with Norman’s 
positive findings as do the more re- 
cent conclusions of Goolishian and 
Foster (48) using neuropsychiatric 
patients. Brown and Bryan (18) dif- 
fer in that their normals do not show 
the superiority of the males so clearly, 
and patterning is somewhat different 
than found by other authors, 

Miscellaneous populations. Geriat- 
ric changes in W-B performance were 
studied by Birren (13) in an unusual 
way. Instead of merely evaluating 
subtest scores he studied the factor. 
scores of geriatric subjects. His Ver- 
bal Comprehension Factor held up 
well with advancing age, while his 
Closure Factor (performance) showed 
sizable decrements. As a final com- 
ment Birren states, “It is apparent 
that the W-B does not include tests 
of many of the known primary abili- 
ties. For this reason, it is of limited 
use in attempting to describe age 

changes in the intellect.” 

Colored and white neurotics were 

matched for the usual variables by 
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Davidson et al. (30). The A subtest 
and performance scores were lower 
for the colored. Goldstein (47) dis- 
cusses problems encountered in using 
the W-B in South Africa. A great 
many changes had to be made to 
adapt it to this population. Malect 
and Montanari (79) present a similar 
report on the adaptation of the test 
for use in Italy. A report from Cal- 
cutta University (21) cites the same 
sort of problems in adapting the test 
for India. V and PA required the 
most alteration. Aeppli-Tanner (2) 
found she could employ a German 
translation with Swiss adolescents 
“without extensive alterations. 
Gross quantitative results compare 
well with American norms; Hower , 
subtest discrepancies were preser. 
While not too much has been pa 
lished about the use of the WDA 
vocational counseling, a few studie 
appeared during this review intervat 
Of particular importance is parer 
son’s (93) guide for counselors. Mea i 
-B scores are presented for each © 
16 broad occupational groups by 
Simon and Levitt (108). Another - i 
Ladd (70) discloses higher V IQ's 10 
students in academic teacher-prep® 
ration areas as compared with ne 
academic teacher-preparation arer 
Those in the nonacademic a 
showed higher P 1Q’s than those “i 
the academic areas. The authors c% A 
tiously suggest there might be sorai 
implications for counseling. Men 
and Heathers (84) present a normi, 
tive study setting forth W-B ent 
scores for a college counseling-ce" n 
group and also for college ee 
volunteers. They found little dite” 
ence in the 1Q’s for the two samP p 
They concluded that those with P© 
sonal difficulties were not likely t° 
significantly lower on the W-B- ah 
Summary. Sex differences ge 
normals and psychiatric subje? 
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seem to be rather well established. A 
study employing a representative 
sample of the general population to 
identify these sex differences appears 
much needed at this time. It would 
ie of interest to see whether sex dif- 
apo appeared in the standardiza- 
a results of the new Wechsler 
i. t Intelligence Scale. Perhaps dif- 
a ences in performance by the sexes 
: one level of intelligence are re- 
Hee in direction at a different level 
e hat the total mean differences in 
len performance are minimal. Dif- 
eed in A, D, and PC are repeat- 
X picamen (48, 89, 120). Re- 
beet ers and clinicians should take 
are of this and specify sex as well as 
a when proposing patterns or ap- 
Plying pattern analyses. 
a of the more familiar popula- 
mo of past research have received 
cea attention recently, while 
70 Ta interests appear (2, 47, 49, 
aa 9, 84). It seems appropriate 
Sh bi the problems of the past are 
a ed, new interests should appear. 
Seni E this progress 1S the 
tension of the W-B to foreign popu- 
ations, 


R 
efinements and Critiques 


we dministration and scoring. Kit- 
meee, and Blumberg (66) present a 
ink ementary guide for administer- 
me Scoring the W-B. It includes 
well principles and examples as 
ion, F good descriptions of conven- 
ite | subtest rationale. Cohen (23) 
Cee in scoring and admin- 
of oa biases. He studied samples 
abl Bs from 13 examiners and was 
in to identify evidence of examiner 
labili that would serve to reduce re- 
ree ity. Those subtests that show 
: est reliability were those that pro- 
ioe the most systematic interex- 
Ro: iner disagreement. This raises the 
ssibility that much of the varia- 
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bility of subtests that is ascribed to 
unreliability may be systematic var- 
iance related to the nature of the ex- 
aminer. 

Item order of the various subtests 
again receives some consideration by 
Norman (89). An incidental finding 
of his larger study was that the order 
of difficulty for superiors agreed bet- 
ter with the order found by Jastak 
than with Wechsler’s (131). Mech 
(83) presents the order of difficulty 
for high school students. He suggests 
that the item order be changed and 
then appends the statistically wise 
but impractical condition that this 
“| applies only to students in the 
12th grade.” Mech is faced with the 
enigma of establishing an order of dif- 
ficulty that will apply for any group 
of subjects. It would seem necessary 
to revert to Wechsler’s intent of 
establishing the item order on a repre- 
sentative population. Different or- 
ders of difficulty found with small 
restricted portions of the whole popu- 
lation would not then be a basis for 
challenging the originally proposed 
order. Russell (102) proposes a re- 
vised sequence of V words for use 
with neuropsychiatric patients. 

Other subtests have been studied. 
For example, Luchins and Luchins 
(74) examined the effect of varying 
the instructions on the DS subtest. 
This well-designed study permits the 
following conclusions: (a) initial test 
score cannot serve as a reliable index 
of learning, (b) speed emphasis may 
retard performance, and (c) Wechs- 
ler’s directions are sO ambiguous 
that they produce greater variability 
among the subjects as compared with 
the experimental directions. Guertin 
(51) was interested in a different sub- 
test. He studied the effect of instruc- 
tions and item order upon A. Con- 
trary to the investigator's expect- 
ancy, the superior subjects of the 
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study were not threatened by en- 
countering difficult items early in the 
presentation. In fact, they accepted 
the challenge and performed better 
than with conventional order of pres- 
entation. It seems possible that men- 
tally subnormal subjects might show 
just the opposite reaction, but this 
has yet to be verified. 

Eglash (38) studied the relation- 
ship between the FS score and the 
shoes item on C. He finds some justi- 
fication for relating the number of 
reasons given to intelligence. How- 
ever, the spread between one and two 
reasons is such that it would be a 
more appropriate cutting point than 
the one in current use. Following up 
on this suggestion, Armstrong (8) re- 
scored the shoes item, and the 7 be- 
tween C and the FS rose from 42 to 
48. The OA subtest was the subject 
of the article by Shannon and Rossi 
(106). They suggest a presentation 
method for OA that was completely 
described several years earlier by 
Derner and Aborn and was covered 
in our last review. 

Statistical analysis. 
ysis with high school 
made by Mech (83). Item order and 
discriminative values are presented 
for each item. Stanley (115) discusses 
what he feels “every good clinician 
should know.” The article deals with 
what to tell the Psychiatrist when the 
V IQ and P IQ are both larger or 
smaller than the FS IQ. A rather 
mathematical description of the fal- 
lacy of averaging averages points out 
that averaging the V IQ and P IQ 
yields a meaningless “average” IQ 
that could not be expected to cor- 
respond to the FS IQ 

Cohen (24, 26) cleverly factor- 

analyzed the W-B subtests using 
three types of psychiatric patients. 
He makes a methodological contribu. 
tion by demonstrating that the fac- 
tors underlying success on a test are 


An item anal- 
students was 
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a function of the sample of people 
taking the test. The material is pre- 
sented in detail in a relatively non- 
technical fashion in another paper 
(25), and it is highly recommended to 
clinicians. Cohen identified the three 
factors: Verbal, Nonverbal, and Dis- 
tractibility. Similar factors wee 
derived by Birren (13) for a group © 
elderly subjects. His factor names 
were respectively: Verbal Compre- 
hension, Closure, and Rote Memory. 
In addition a fourth factor is tenta- 
tively identified as Induction. Alder- 
dice and Butler (3) factor-analyze 
the performance of mental defectives 
but found only a general and a per- 
formance factor. Whiteman iste 
Whiteman (137) report on the appli 
cation of Jastak’s pipdeter 
clusters: Reality perception, Psyc! t 
motor efficiency, Language polari a 
and Motivation. Wheeler (136) a i, 
reports the results of a factor analys! 
of the W-B and other instruments 
Summarizing, there seems to Be a 
fairly good overlap of factors, wien 
few factors appearing only with Me, 
tain subjects. Most frequently oe 
countered are a verbal, a perfor 
ance, and an attention factor. 54) 
Miscellaneous. Gurvitz C 
reached his limit of professional to 
erance and wrote a critical article Bt 
the defects in standardization so ; 
pling and procedure for the list 
he criticisms are too many ot is 
and reading of the original artic Sa 
essential for anyone planning to © a 
ploy data from the standardizatio 
He also scolds Wechsler for his care 
lessness in changing subtest nee 
tions between editions of the mon za- 
yet not presenting any restandar is _ 
tion. Gurvitz makes a most be i 
able plea for more adequate standa 
ization sampling in future work- 
The relationship between leara 
ability and W-B performance BY 
investigated by McLean (76): 


ing 
s 
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studying neuropsychiatric patients 
_With both the W-B and conventional 
learning tasks, the author was able to 
reach several conclusions: (a) higher 
V IQ is indicative of ability to learn 
in verbal situations, (b) higher P IQ 
is indicative of ability to learn in per- 
formance situations, and (c) subtest 
variation is inversely related to learn- 
ing proficiency. The touch of the 
learning psychologist is also encoun- 
tered in the experiment by Burik 
(19). He found that motor test- 
Scores, such as rate-of-tapping, were 
Most closely related to DS. There 
Was also good evidence that the in- 
cidental learning of the symbols was 
not related to the DS score. He con- 
cludes that the DS subtest should be 
regarded more as a test of visuomotor 
Coordination than as a test of new 
learning. 
d Hypnosis was used by Kline (67) to 
emonstrate the W-B in action. He 
Be tically regressed and progressed 
ob 2-year-old woman. W-B’s were 
po tained in these two states as well 
S at normal age and in a waking 
TOE Under the varying conditions 
F IQ remained surprisingly con- 
ant although weighted scores were 
Gaessarily diminished for both the 
A’s of eight and 65 years. The 
3 anges in subtests with progressed 
oh, conformed remarkably well to 
S inical expectancy. The DQ also is 
: pPropriate for that age group. Even 
ae six on BD, which is not 1n the 
s mct order of difficulty, was failed 
en the subject “believed” she was 
It would be expected that the 
foonforming” subject would select 
Or failing the last and more complex- 
tearing design which employs all 
ae blocks. Not much is proved, but 
is study makes interesting reading. 
> thesis by Ficca (39) relating auto- 
omic features to W-B performance 
Sounded interesting but was not seen. 
A series of studies by Stacey and 
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others (113, 114, 115) investigated 
Gerstein’s hypothesis about concep- 
tual performance. Gerstein desig- 
nates the descriptive definition as 
lowest in intellectual requirements, 
while the functional is intermediate; 
and the conceptual definition is most 
associated with high intelligence. 
Subnormals failed to confirm the pre- 
dictions of frequency of occurrence for 
the three types of definitions for yV 
(114). A similar result was obtained 
when concentrating on S answers 
(113). In the most recent study, 
Stacey and Spanier (115) switched 
to superiors for a study of V re- 
sponses. The functional was actually 
associated with lower intelligence 
than was the descriptive-type defini- 
tion. This finding was in line with the 
earlier two studies (113, 114) and 
suggests that the descriptive-type 
definition is of a slightly higher level 
than the functional definition. 
Summary. There seems to be a 
welcome increase in comprehensive, 
theoretical articles this time (8, 13, 
23, 54, 76, 83, 89). The sophistica- 
tion of researchers seems to be ad- 
vancing. Some evidence of this is 
found in the relative large number of 
factor-analytically designed studies 
in this review (3, 13, 24, 26, 136). 
Also new is the learning theory de- 
sign focused upon test theory ration- 
ale (19, 76). In addition to the more 
comprehensive studies, there is an 
ever-growing body of knowledge about 
the rationale of the individual sub- 


tests. 


Tur WECHSLER-BELLEVUE AS A 
DIAGNOSTIC AID 


The use of the W-B as a diagnostic 
aid implies an assumption most re- 
cently expressed by Jastak “ . . . in- 
telligence is not a global trait but a 
general and pervasive part function 
of the personality.” Persistent criti- 
cism of such an assumption is ex- 
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pressed most recently in various re- 
search and by Schofield (105). 


As a Measure of Emotional Factors 


Brower in 1947 proposed that a 
negative correlation existed between 
the Hy, Hs, and Pd scales of the Min- 
nesota Multiphasic Personality In- 
ventory and the W-B IQ, the prob- 
able rationale being that neurotics 
and persons with character disorders 
would show intellectual impairment. 
Winfield (138) checked this hypothe- 
sis and found no support for it, 

Other interest in the relationship 
between personality tests and intel- 
lectual performance is confined to the 
Rorschach test. Probably most com- 
prehensive is the study by Holzberg 
and Belmont (61). They hypothe- 
sized a number of relationships be- 
tween Rorschach signs and W-B test 
performance, but only four of their 
many predictions were substantiated. 
They set forth their empirical find- 
ings under eight features of Ror- 
schach performance with counter- 


tionship betweer H% and PA. How- 
by Spaner 


ation- 
on the Rorschach 
é Five of the ten hy- 
pothesized relationships between 


Rorschach and W-B Performances 
were confirmed. Another thesis (122) 
not seen, also deals with the relation- 
ship between Rorschach features and 
W-B performance, 

Summary, Attempts to validate 
the general assumption that some- 
thing other than a mere IQ may be 
extracted from performance on the 
W-B have been disappointing. How- 
ever, as in the case of the comparison 
between the Rorschach and the W-B, 
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it might be reiterated that (a) the 
Rorschach scores, as statistically 
valid measures of personality have 
not been established, and (b) the 
rationales for the subtests as pro- 
posed by Wechsler have been chal- 
lenged. 


As a Measure of Change Following 
Therapy 


Markwell, Wheeler, and Kitzinger 
(81) recorded the W-B penonmane 
of schizophrenics before and e. 
prefrontal lobotomy; and Smyt 
and Wilson (110) did the same w 
electroconvulsive therapy. Ma 
well, Wheeler, and Kitzinger Diary : 
Statistically significant differences a 
tween pre- and postoperative a k 
performance; Smykal and W i 
found some gross group aine 
tion but considerable individual, ve 
ferences to attenuate the positi 
aspect of their findings. 

In the Smykal study, the W-B le 
administered before treatment 21 
after the fifth and tenth adminine 
tion of shock (shock being ayi 
twice a week for five weeks). In the 
esting results demonstrated (a) de 
Percentage of IQ's of 35 or ng 
creased 50 per cent after the hat 
shock; the Markwell group foundt A 
Obotomy increased the testability, ne 
the patients by 20 per cent, (b) the 

ighest point of efficiency was at fh 
midpoint of treatment, after the AA 
shock; and (c) the preshock pattie 
corresponded to Rapaport’s chro at 
schizophrenic group, while the Peis 
fifth shock collated highly with 
acute group. j to 

Summary. Research failed at 
demonstrate differences in SMa her 
Performance reliably following How- 
Prefrontal lobotomy or ECT. two 
ever, the generalization of the ion, 
researches, reviewed in this Lot 
must be delimited by the restric ae 
size and nature of their sample poP 
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lations. What the research does at- 
test to is the fact that, as the result 
of therapy, the patient does become 
more amenable to testing. 


Mental Deterioration Index 


Validity. Tests of the validity of 
Wechsler’s (131) Mental Deteriora- 
tion Index (MDI) have produced 
generally negative results even though 
Howell (62) found that, “ . . . it may 
be tentatively concluded that Wechs- 
ler’s assumption of deterioration at 
all levels is correct.” Corsini and 
Fassett (28) culled the test protocols 
for 100 inmates of San Quentin pris- 
on. In this cross-sectional study, 
the total population was divided into 
twelve five-year periods, starting 
with ages 15-19 and going to 70-74. 

hese investigators found that by 
and large the verbal test scores in- 
Crease with age while performance 
Scores decline, suggesting a V—P 
rau rather than the “hold,” “don’t 
hold” ratio, as the more sensitive 
Measure of mental deterioration. 
These findings are in good agreement 
with those reported by Birren (13). 

Bensberg and Sloan (10) tested the 
validity of Wechsler’s MDI on a 
group of mental defectives. They 
culled their protocols from the files of 
the Lincoln State School and Colony, 
excluding those individuals who had 
Not received S-B IQ’s of 42 or better, 
and who showed evidence of psychot- 
ic or organic processes. The results 
with the W-B were compared with 
those on the Arthur Point Scale and 
the S-B. The results suggested (a) 
although the Arthur and the S-B 
demonstrated a negatively accelerat- 
ing function with regard to CA incre- 
ments, the W-B scores tended to in- 
crease, rather than decrease, and (b) 
the W-B scores did not differentiate 
the older group (30-55) from their 
earlier MDI scores obtained between 
the ages of 15 and 24. The authors 
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conclude that, ‘'...the ‘normal 
deterioration’ which Wechsler found 
for older subjects may have been an 
artifact due to faulty sampling, at 
least for the lower intelligence groups.” 

The study by Fox and Birren (41), 
however, suggests that when psy- 
chotic or neurological dysfunctioning 
is excluded, a group of sixty-year- 
olds are significantly different in sub- 
test patterning from a group of 69- 
year-olds. Fox and Birren find that 
only two of Wechsler’s “hold” group 
“hold,” viz., V and C. 

Glik (46) offers a very interesting 
thesis to explain the inability of meas- 
ures of deterioration, as Wechsler’s 
MDI, to reflect intellectual deteriora- 
tion adequately. He assumes that 
the manner of measuring present func- 
tioning is in error. Glik reasons that 
the ability to recognize the meaning of 
items one once knew is a more sensi- 
tive measure of deterioration than 
measuring what items the subject 
can actually recall in the present. 

Testing this assumption on the yV 
and I subtests, Glik found a signifi- 
cant £ between recall and recognition 
on I items only. Glik generalizes his 
results to verbal questions without 
any really adequate substantiation 
of this point, although his proposals 
surely merit further study. At any 
rate, Glik’s results do suggest that a 
discrepancy score between recall and 
recognition may be more meaningful 
than the method of recall now used 
with V and J items. 

In senile psychosis. The major 
question to be asked here is: does = 


addition of a psychotic process super- 
ormal’’senescent 


imposed upon the “n 
poe 3 alter the W-B 


decline significantly 
performance? i . 

Berkowitz (11), Botwinick and Bir- 
ren (15), and Doerken and Green- 
bloom (36), tested the efficiency of 
the W-B to differentiate “normal 
and “abnormal” seniles and found 
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positive results, while Botwinick and 
Birren (16) found that the MDI does 
not differentiate a control group of 
seniles from an experimental group of 
seniles with psychosis. The investi- 
gators of the latter study as well as 
Birren (13) emphasize that mental ill- 
ness in these aged caused a greater 
deficit in P IQ than V IQ although 
there was good evidence of inroads on 
Verbal performance also. 

In organic brain disease. Anderson 
(6) and Ptacek and Young (94) con- 
ducted research in attempts to de- 
termine how effective the MDI is in 
identifying mental deterioration in 
organics. The results point to a re- 
jection of the sensitivity of the MDI. 

Anderson (6) addressed himself to 
the effects of laterality of localization 
of brain damage and its effect upon 
the MDI, with the results being nega- 
tive. Although the subjects were 
identified neurologically as organic, 
they were not cross identified by the 
MDI 2 

Wheeler and Wilkins (135) tested 
the Hewson ratio, an empirical for- 
mula devised to differentiate organics 

_ from nonorganics, but found it to be 
lacking in ability to make individual 
predictions. 

In neurosis. O'Connor (90) tested 
he effect of neurosis upon the MDI. 
he results indicate that the “hold” 

versus “don’t hold” ratio is inapplica- 
ble to neurosis, 

Summary. Research attempting 
empirically to define the utility of the 
MDI has eventuated in Negative re- 
sults. The MDI has proven to be in- 
sensitive to deterioration in senility 
with or without psychosis, organic 
brain disease, and neurosis. The re- 
search findings did suggest that the 
present “hold’—“don’t hold” ratio 
was not sensitive to changes due to a 
deteriorative process, but that the 
verbal-performance dichotomy was. 

Regardless of the nature of the 
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findings per se, as we look back over 
the various investigations conducted 
in this particular area with the W-B, 
one is struck by the number of impor- 
tant methodological criticisms that 
may be leveled against the experi- 
mentation. In the first place, two 
validity studies use highly restricted 
sample populations. Both these in- 
vestigations rejected the utility of the 
MDI with their populations, but it 1s 
hard to generalize as to the validity 
of the MDI with other populations. 

It is most difficult to propose seri- 
ously an index of deterioration that 
would cover the full range of intelli- 
gence and age. Probably Wechsler 
was too ambitious. It is still more 
difficult to make cross comparisons of 
studies with such wide ranges of vari- 
ables which are known to relate to 
subtest patterning, viz., IQ, CA, edu- 
cation, and vocational skills. Such 
variables only serve to confound the 


data and confuse the research wor. ker. 


Even more difficult to interpret are 
results from a single sample compose 
of both psychotics and nonpsychotics: 
organicity due to trauma, and organ- 
icity due to endogenous causes, an 
even varying degrees of organicity 
and central involvement (94). Ander- 
son in another study (7), attempting 
to define the effect of laterality ° 
localization of brain damage, has this 
to say abouthis ownsample,“ . . . non- 
dominant hemisphere sub-group - « ; 
was significantly older than the dormi 
nanthemispheregroup,” and“. .- i 
though brain damage was establishe 
beyond any reasonable doubt . - « se 
criterion of unilaterality is probably 
relatively poor.” r 

It appears that the conclusions 
drawn from the results of the invest 
gations attempting to define the € 
fectiveness of the MDI must be te" 
tative because of restricted or GEGT 
tionable sampling. These have 4P 
Parently introduced distortion int? 
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the subtest performance, confounding 
the results. Yet there is uniform 
agreement that the MDI was not 
effective with those samples em- 
ployed. 


Scatter and Pattern Analysis 


Considerable research has been 
conducted on patterns of perform- 
ance on the subtests, and on scatter. 
Since the last review, emphasis has 
been placed on empirically defined, 
objectively derived analyses of pat- 
terns, and considerable clinical and 
research acumen has been canalized 
into this intriguing question. 

General findings. Jastak (64) finds 
that pattern analysis is more reliable 
if raw, rather than weighted scores, 
are utilized. Alimena (4) and Jack- 
son (63) use a z-score transformation 
of the weighted scores to alleviate 
the scatter inherent in the test itself. 
Bradway and Benson (17) are con- 
à cerned with the extreme individual 

deviations present in Rapaport’s 
findings for the diagnostic groups. 

Jastak (64), Monroe (88), and 
Wittenborn and Holzberg (139) all 
found that emotional adjustment was 
inversely related to the amount of 
scatter, all using various criterion 
measures. Collins (27) found a direct 
relationship between variation on 
EEG patterns and variation in IQ 
scores. Purcell, Drevdahl, and Pur- 
cell (95) found a Pearson r of .31 
(significant at .01 level) between 
measures of anxiety (Hypochondri- 
asis, Depression, and Psychasthenia 
Scales on the MMPI) and scatter or 
the subtests, and Moldawsky and 
Moldawsky (87) found that D is an 
apt indicator of anxiety. 

However, neither Shoben (107) 
nor Matarazzo (82) found any rela- 
tionship between anxiety and per- 
formance on the subtests. Kaldegg 
(65), Love (73), and Wittenborn and 
Holzberg (139) found no significant 
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relationship between variability of 
subtest performance and pathology; 
and Rakusin (98) found “ . . . a lack 
of uniformity in normal scatter pat- 
terns on vocabulary scatter.” Raku- 
sin, incidentally, found a significant 
difference on the patterns of subtests, 
for a clinic (maladjusted) and non- 
clinic (adjusted) group, but also dis- 
covered that these apparent differ- 
ences disappeared when the effects of 
age and IQ on total scatter were alle- 
viated by the method of multiple 
covariance. Rakusin concludes (98), 
“The differences obtained could not 
be attributed to maladjustment.” 
Merrill and Heathers (85) conclude, 
on the basis of their research on col- 
lege students, that “scatter is to be 
expected in groups of above average 
or superior adults.” Jastak found 
that neurotics, schizophrenics, and 
organics were not differentiated by 
Wechsler’s signs, and Wittenborn 
and Holzberg (139), in a chi-square 
analysis of the performance of para- 
noid schizophrenics, manic-depres- 
sives, alcoholic psychotics, and psy- 
chopathic personalities, found theit 
individual group performances in- 
distinguishable. 

Holzberg, Alessi, and Talkoff (60) 
tested the ability of seven judges tc 
predict premorbid intelligence of ten 
psychotic patients at Connecticut 
State Hospital. In general, the 
judges made their evaluations on the 
basis of the amount of intertest scat- 
ter in the individual protocols. In- 
tercorrelations between the judge's 
estimates ranged from 15 to .87 
However, such a matrix of correla 
tions tells little about the over-al 
intercorrelation of the judges, 1e.. 
how well were they doing as a group. 
Hence, one of the present writers 
(GF) simply reproduced the complete 
correlation matrix from that pre 
sented in the article, converted the 
rho’s into ranks, and computed a W 


ell 
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a coefficient of concordance, a non- 
parametric multiple-rank correlation 
technique, which yields an over-all 
coefficient of correlation. W was .27 
(not significant) hence indicating 
that there was poor agreement be- 
tween the judges. The poor results 
of the Connecticut research may be 
a function of the inexperience of the 
judges since four of the seven were 
interns, the other three being staff 
members. 

Methodology. Before we discuss the 
research on pattern and scatter an- 
alysis with regard to the various dis- 
ease entities, the writers deem it nec- 
essary to preface the discussion. An- 
tecedent to an understanding of the 

_ effect of psychopathology upon W-B 
_ performance seems to be the need to 
recognize that factors other than psy- 

_ chopathology affect subtest perform- 
ance, and must be taken into account 
in such analysis. For instance, Cohen 
_ (23) found an examiner bias; Collins 
(27) found variation in subtest as a 
| function of age; many (18, 48, 51, 89, 
| 109, 120, 125) found significant devia- 
j tions on subtest scores due mainly to 
sex; Aronov (9) and O'Connor (90) 
testify to the effect of education on 
atter; and Aronov (9), French and 

unt (44), Merrill and Heathers (85), 

orman (89), O’Connor (90), Ortar 
(92), and Schnadt (104) validate the 

| effect of IQ level per se on scatter. 

|. The reader will understand why 
knowledge of these findings is propae- 
deutic to an adequate purview of the 
research on pattern analysis with the 
| W-B. In attempts to control for the 
‘effects of these variables, the research 
worker has relied upon the statistical 
‘concept of randomization, Statisti- 
cally the assumption is maintained 
that if you randomize the effects of 
these uncontrollable, hence, unmeas- 
urable parameters, equally through- 
out the cells of one’s table, the effects 
of the randomization will be to 


| 
| 
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equalize the effect of these parame- 
ters upon the variance. Such statis- 
tical reasoning has been translated 
into research methodology in the 
form of the matched group design. 
However, certain research suggests 
that the simple randomized design 1s 
not the answer to a research worker's 
rayers. 
d Frank, Corrie, and Fogel (43) and 
Reich (99) have demonstrated that 
we are no longer meaningfully speak- 
ing about “neurotics” or “‘schizo- 
phrenics” when we include wide 
ranges of attributes (age, education, 
1Q) in one group. For instance, Reich 
(99) demonstrated that when he com- 
pared the subtest performance of his 
group of schizophrenics from Kings 
County Hospital, with data from 
other groups of subjects, similarly a 
agnosed, presented in previous pub- 
lished literature, the rank-order cor- 
relations ranged from .84 to 
Frank, Corrie, and Fogel (43) pet 
formed an analysis of variance on the 
subtest performance of like-diagnose 
cases presented in the literature 
which included wide ranges of I 
education, age, etc., and found them 
to be statistically different. If one 
begins by comparing incomparables, 
the results are bound to be spurious: 
Subtest distortion is not just @ 
function of research methodology: 
but may accrue as a result of factors 
inherent in the W-B itself. For in- 
stance, Cohen (26) finds that the W- 
B subtests do not always appear tO 
measure the same factors in different 
types of patients. Marks (80) eni 
cized Wechsler’s qualitative pateri 
approach to scatter analysis, since : 
permits of a wide range of subtest 
variation coding within a given diag 
nostic category, such as from +È 5 ta 
This tends to “load the data” 10 
terms of a lack of reliability a” 
validity of Wechsler’s diagnostic pat- 
tern. It appears “safe” to review "° 


, 
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search on pattern analysis in the dif- 


ferent diagnostic categories now that 
been 


the “warning alarm’ has 
sounded. 
Schizophrenic patterning. Harper 


(55) applied Fisher’s discriminant 
function to pattern analysis. In gen- 
eral, Harper felt that the pattern of 
subtests differentiated his group of 
schizophrenics from his normals, and 
further, that the subtypes, paranoid, 
hebephrenic, and catatonic, were sig- 
nificantly differentiated from the rest 
of the schizophrenics and the nor- 
mals. 
Harper wrote, as regards the re- 
gression equation he had formulated, 
The extent to which the regression 
weights would be applicable to a new 
sample from a different hospital . . - 
is not known...” Cross validation 
was needed, so on suggestion by Solo- 
mon Machover, Reich (99) performed 
this test. He found that the Harper 
equation identified 65 per cent of a 
new schizophrenic population at 
Kings County Hospital as schizo- 
phrenic, concluding that, in general, 
individual differentiation{was poor. 
On a further suggestion by Machover, 
Frank (42) tested the effectiveness of 
Harper’s equation when applied to a 
heterogeneous group of psychotics 
(excluding schizophrenics) at Kings 
County Hospital. He found that the 
Harper equation misidentified 47 per 
cent of this population as schizo- 
phrenic, thereby questioning the 
utility of the formula. 

Rogers (100) compared schizo- 
phrenics and neurotics on fifteen signs 
postulated by Rabin, Rapaport, and 
Wechsler, and found only eight of 
them tenable. However, Rogers 
grouped the various subtypes © 
schizophrenia under a single heading. 
On the other hand, on the basis of a 
comparison of 50 neurotics an 
schizophrenics, little agreement was 
found between Wechsler pattern an 
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psychiatric diagnosis. Whiteman and 
Whiteman (137) had some success 
using factor analysis on the perform- 
ance of 50 schizophrenics and 50 
police applicants, finding that clusters 
entitled “reality perception” and 
“psychomotor efficiency” significantly 
differentiated the groups. McNeal 
(77) checked Wechsler’s signs for 
schizophrenia on equated normal and 
schizophrenic groups for 340 male 
veterans of World War II. “Only one 
of Wechsler’s signs discriminates sig- 
nificantly between the... groups 
_.. but it identifies schizophrenics 
more normal than normals.” What 
appears to be a very interesting and 
thought-provoking finding is Mon- 
roe’s (88) demonstration that schizo- 
phrenics with average or above aver- 
age intellectual level had no more 
scatter than neurotics, and that ex- 
treme scatter was a characteristic of 
schizophrenics only with low ob- 
tained 1Q. 


Patterning in the affective disorders. 
The writers could find but one in- 
vestigation in this area during the en- 
tire five-year period, an apparent 
sign of the need for more research. 


Waldfogel and Guy (126) found dif- 


pressive or manic st 
depressive psychosis. However, fur- 
ther analysis revealed that the pa- 


rameter of age wa 1 
known amount 0 
tern, hence depreciatin: 
fulness of the results. 

Patterning in neurosis. Monr 
(88) using 2 factorial design to te 
the effect of levels of adjustment, 1n 
telligence, and locality, found littl 
differentiation between neurotics an 
well-adjusted normals when the in 


teraction of these effects were delete 
p variance 


from the between-groul 
Schillo (103) found no difference be 
tween neurotics and normals on dis: 
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parity between verbal and perform- 
ance IQ, performance on individual 
tests, or rank-order correlation of test 
performance, although his neurotics 
showed greater variability from its 
own group mean than the normals. 
Schillo’s subjects, though, included 
individuals subject to anxiety reac- 
tion, obsessive-compulsive reaction, 
and mixed neurosis; and no data are 
offered as to the possible differences 
within this group as related to the 
subtypes. 

Patterning in mental deficiency. 
Alderdice and Butler (3) seem to be 
the only investigators interested in 
identifying a mental defective pat- 
tern. They present thoughtful ap- 
proaches to pattern analysis and com- 
pare their findings with those of pre- 
vious workers. Their Pattern for 
mental deficiency, while not a sharply 
delineated one, is similar to those pre- 
viously reported. It is most at vari- 
ance with the pattern initially pro- 

_ posed by Wechsler (131). Little work 
_ seems to have been done in terms of 
an understanding of the performance 

f the mentally defective subject on 

he subtests of the W-B. One investi- 

ation, by McPherson and Fisch (78) 
attempts tounderstand the defective’s 

_ poor performance on S. Using ap- 
proximately 30 subjects with an I 
‘range of from 55 to 85, they found 
_ that 66 per cent of the failures were 
_ due to evasiveness, what McPherson 
` and Fisch term “learned negativism.” 
The authors suggest that as a result 
of such behavior, testing may not 
yield an exact measure of the defec- 
_tive’s ability. 
_ Patterning in organicity. Collins 
| (27) investigating the effect of epilep- 
tic involvement found that only cases 
‘diagnosed with organicity demon- 
strated deterioration. Further analy- 
sis revealed the superiority of the 
performance of the endogenous to the 
xogenous group. A comparison of 
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Collins’ 400 outpatient epileptics 
with the protocols of institutionalized 
groups of epileptics, psychotics, neu- 
rotics, and psychopaths, presented in 
previous research, showed consider- 
able differences. In the analysis of 26 
cerebral arteriosclerotic patients com- 
pared with 26 patients with “other 
forms of cerebral pathology” matched 
for age, education, and IQ, Oppen- 
heim (91) found no significant differ- 
ences on subtests or in total varia- 
bility of performance. 

In one investigation Diers and 
Brown (35) found that there was an 
inverse relationship between W-B IQ 
and the Hughes factor-analytically 
derived signs on the Rorschach for 
intracranial pathology. In another 
investigation Diers and Brown (34) 
analyzed the protocols of 24 patients 
diagnosed as having multiple sclero- 
sis. When compared with normal 
controls, the sclerotic group demon- 
strated a lower memory span for D 
and superior PC; however, a Pearson 
r of .17 (not significant) was obtained 
between Wechsler’s organic signs an! 
actual organicity. The authors offer 
the following alternatives to explain 
their results: “Quantitative signs of 
organic damage to the brain, or the 
index of deterioration on the Wechs- 
ler-Bellevue scale, are inadequate as 
an indicator of existing cortical dam- 
age in multiple sclerosis,” or “No 
cortical pathologic changes existed in 
the population with multiple sclerosis 
comprising this study.” 

Patterning of the sociopath. The 
patterning of the sociopath (psycho- 
path, delinquent) has come undef 
considerable scrutiny since the last 
review. Bernstein and Corsini (12) 
tested the validity of Wechsler’s as- 
Sumption that performance scores 
are higher than verbal scores—which 
they could not reject. By the method 
of forward and backward presenta- 
tion of the subtests, they rejecte 


RESEARCH WITH W-B INTELLIGENCE SCALE: 1950-1955 


the assumption that this was a spuri- 
ous difference due only to the fact 
that the performance part of the test 
is administered late in the examina- 
tion period permitting the sociopath 
to adjust more adequately. 
Graham (49) found that there was 
a significant relationship between 
performance on the W-B and school 
performance, and that the scatter- 
gram of Wechsler’s adolescent psy- 
chopath closely approximates that 
of the unsuccessful reader. Graham 
makes the appealing suggestion, “It 
does not seem unreasonable to as- 
sume that the profile (Wechsler’s 
adolescent psychopath) is typical of 
educationally retarded youth with- 
out regard to his moral qualities.” 
_Vane and Eisen (125) tested the 
difference on the W-B between 
matched groups (age, intelligence, 
eae onome background) of de- 
anent and nondelinquent girls, and 
ee none. Higher scores on Dand 
due the nondelinquent than for the 
i inquent group was about all they 
ound to differentiate the two groups. 
Gurvitz (52) compared the subtest 
Peor marce of a matched group of 
Ae diagnosed as psychopaths 
W. nonpsychopathic” inmates, and 
echsler’s standardization group of 
Poe toparhs T hrough a chi-square 
aa t-test of significance, the results 
E licated that there was no charac- 
ouc subtest pattern for the psy- 
chopath, and, further, that there were 
no differences between the “psycho- 
pathic” and the “normal” prison in- 
Was and between these two an 
echsler’s psychopathic population. 
oes a similar study, Clark and 
oore (22) had previously attempted 
to differentiate the test patterns O 
subclassifications of military offenders 
to NP disorder, immaturity reac- 
ion, and pathological personality 
types—presumably psychopaths) but 
With no success. 


251 


If one considers unauthorized dis- 
charge from a medical hospital as 
asocial or antisocial behavior, Thur- 
ston and Claden’s (123) study on the 
irregular discharge of tuberculosis 
patients becomes pertinent here. 
However, the W-B could not differ- 
entiate the “AWOL” group from 
those who remained in the hospital, 
either in terms of IQ or subtest pat- 
terning. 

Summary. Our somewhat jaun- 
diced eye continues to reject the 
assumption of unique subtest per- 
formances by schizophrenics. The 
results are at best inconsistent, but 
their very inconsistency may testify 
to the erroneous assumption that 
randomization will control for the 
effect of the parameters upon subtest 
performance. But the reader of such 
researches is left in the predicament 
of being uncertain whether to at- 
tribute the negative results to either 
the methodology or to the hypothesis. 

For groups of mental defectives 
patterning seems better established 
than for schizophrenics. The search 
for patterns of performance charac- 
teristic of the organic brain-damaged, 
the manic depressives, and neurotics 
have been rather fruitless but, as 
with schizophrenics, the sampling 
methods have left much to be desired. 

Researches with the sociopath indi- 
cate some points of agreement but 
there is still much disagreement. The 
pattern proposed by Wechsler, tado- 
lescent psychopath,” js coming to be 
evaluated in terms of the subjects’ 
academic training, motivation, and 
general background of experience. 

The over-all impression gained 
from the pattern studies reviewed is 
that the findings are inconclusive. 
Patterns suitable for clinical use will 
not be forthcoming until methodo- 
logical improvements appear. Fur- 
ther advances in typology, can do 
much to establish suitable diagnostic 
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criteria for which patterns can legiti- 
mately be sought. 


GENERAL SUMMARY 
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gence test achievement. This yeman 
comparatively unexplored with the 
W-B. Moreover, it may have an zi 
portant relationship to the Verbal- 


$ i in 
The past five years exhibit two Performance discrepancies noted 


major changes in the research trends 
with the W-B. Im the first place, 
there has been a realignment of the 
general nature of the studies. Re- 
search with psychiatric syndromes 
has been reduced, whereas a larger 
proportion of the studies reviewed 
deal with the W-B as a test of general 
intelligence, investigating its relia- 
bility, validity, rationale, etc. Sec- 
ondly, it has been noted throughout 
the review that the number of well- 
controlled and statistically sophisti- 
cated studies has markedly increased, 
One rather new and somewhat un- 
expected trend is the demonstration 
of sex differences. It points up an 
additional uncontrolled factor in 
previous studies and has important 
implications for the use of test pat- 
terns, scatter, and such. Of course, 
the need for a larger, more repre- 
sentative sample for the demonstra- 
tion of sex differences, or their ab- 
sence, at different levels of intelli- 
gence still remains, It may be added 
that another factor that needs con- 
trol in research is that of socioeco- 
nomic level. Social class has been 
found to have a bearing on intelli- 


; å s, and others 
sychopaths, delinquents, an ae 
Toe status in such classifications 


may be related to socioeconomic level 
as well. 

When one looks at the work done 
with various psychiatric populations 
by means of scatter and pattern 
methods, one can readily conclude 
that “nothing new has been added 
either in methodology or in definite 
findings. The results are still “incon: 
clusive.” It may be wonder 
whether it should not be said tha 
there are no positive results ae 
of leaving an open crack in the 
implicitly indicating a atill-tenaclo® 
clinging to an overworked pee r 

Maybe, with the creation © i 
newly standardized instrument, $!™ t 
lar in structure to the W-B, but Ta 
suffering from the numerous wea 
nesses which have been pointed ka 
in this review and in previous Pae 
more fruitful research with pi 
analysis will be forthcoming. T 
ever, such research cannot conum 
to close the eyes to the weakness ?- 
the criterion itself—psychiatric diag 
nostic classification. 
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What we are here calling inter- 
personal perception has also been 
called empathy, sensitivity, under- 
standing, diagnosis, social percep- 
tion, etc. In recent years, the follow- 
ing procedure has been much used 
in the investigation of interpersonal 
perception: A person, or a group, 

called the Other, provides a self- 
description, as by filling out a per- 
sonality inventory or rating scale. 
Another person, called the Judge, 

` predicts the Other's self-description, 
filling out the same inventory or rat- 
ing scale with the responses he pre- 
dicts the Other would give. The 
Judge’s accuracy score is then the 
closeness of his predictions to the 
actual responses of the Other. 

To many Psychologists (e.g., 8, 14) 
this operation has qualified almost by 
definition as a measure of the Judge’s 
ability to empathize with the Other. 
It does seem, at first glance, that the 
Judge should have to “feel himself 
into” the Other's personality, feel- 
ings, attitudes, self concept, and the 
like, if he is to predict accurately the 
Other's self-descriptions, This ap- 
parently straightforward technique 
does not, however, yield simple, 
wholesome data. Rather, beneath 
its surface, we find an intricate com- 
plex of processes and components, 


1 A draft of this paper was Presented at a 
symposium on interpersonal Perception during 
the APA meetings, September, 1954, This in- 
vestigation was supported by a research 
grant (M-650) from the Institute of Mental 
Health, National Institutes of Health, Public 
Health Service. 
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Our purposes in this paper are (a) to 
identify, classify, and illustrate inter- 
mediary keys—the common meth- 
odological element in many devices 
that have been used to reveal this 
complexity; (b) to describe various 
considerations in the use of this tech- 
nique; and (c) to formulate the rela- 
tionship between our methodological 
model and the analysis-of-variance 
approach developed by others. 


TYPES or INTERMEDIARY KEY 


Our basic idea is that many of ae 
approaches developed independently 
by a host of investigators may Le 
seen as specific instances of a genera 
technique which we call the inter- 
mediary key. This technique con- 
sists in developing a protocol, 1.€., 7 
set of answers to the items, agains 
which are compared the usual tw 
Protocols in this kind of interper 
sonal perception research: the Other $ 
self-description and the Judge’s pre- 
dictions. We can then obtain meas- 
ures of similarity of each of these pro- 
tocols to the intermediary key or pre 
tocol. The intermediary key provides 
an organizing principle for- much ° 
the recent research in interpersona 
perception. Our examples, each 0 
which is described below, fall a 
three categories: (a) a priori keri 
made out by the psychologist on t 
basis of some theoretically significa?* 
Psychological variable; (b) keys 24 
tained by varying instructions 
Judges; (c) keys based on centras 
tendencies of predictions or Sê 
descriptions, 


> —— 


— 
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A Priori Keys 


1. An intermediary key consisting 
of the affirmative responses to all of 
the items is used in work (4) on re- 
sponse sets, such as acquiescence, in 
test-taking. Scoring the Judge's pre- 
dictions and the Other's self-descrip- 
tions with such an acquiescence key 
shows that the accuracy score can be 
influenced by a coincidence between 
the acquiescence tendencies of the 
Judge and the Other. If they have 
similar tendencies in using the acqui- 
€scence end of the response contin- 
uum, the Judge will have high ac- 
curacy. If the Judge has a markedly 
higher or lower acquiescence tend- 
ency than does the Other, his ac- 
curacy will be low. 

2. A key for the favorability of 
Judges’ predictions and of the Others’ 
Self-descriptions (22) embodies psy- 
Chologists’ opinions concerning the 
favorability to self-esteem of the two 
Or more choices provided by each 
inventory item. When Judges’ pre- 
dictions are scored with a key made 
up of the favorable choices we ob- 
tain favorability-of-prediction scores. 

hese will correlate positively with 
the accuracy scores insofar as the 
Other described himself favorably. 

ence, favorability keys can be used 
to expose an otherwise concealed 
aspect of interpersonal perception. 

hen accuracy may result not from 
Perceiving cues and inferring corre- 
ates, from having intuition, or from 
enfühlung. Rather, it may result 
aom a fortuitous concomitance be- 
Ween the Judge’s favorable impres- 
Sion concerning an Other or toward a 
Sroup of which the Other is a mem- 
er, and the Other’s tendency to de- 
Scribe himself favorably. 

- An adjustment key was used 
(19) when the predictions and self- 
€scriptions were obtained on the 


Bell Adjustment Inventory. This 
made possible measures of the Judges’ 
attribution of adjustment and the 
Others’ self-described adjustment. 
Such measures should have much in 
common with those yielded by the 
favorability key described above. 


Keys Obtained by Varying Instructions 
to Judges 


4. Suppose we obtain the Judge’s 
self-description on the same items 
as those on which he predicted the 
Other’s response (2, 9, 10, 17, 21). 
Comparing this protocol with the 
Other's self-description yields a score 
that has been called “real similarity”; 
comparing the Judge’s self-descrip- 
tion with his predictions of the Other 
yields a score for the Judge’s assumed 
similarity between the Other and 
himself. This procedure can be used 
to break down any accuracy score 
into two components, ‘warranted 
assumed similarity” and ‘warranted 
assumed dissimilarity.” Judges are 
accurate in some cases because they 
are highly similar to the Other and 
assume high similarity; in other in- 
stances Judges are accurate because 
they are dissimilar and assume little 
similarity. 

5. When the Judge predicts the 
responses of the typical member of 
the subcultural group to which the 
Other belongs, we can obtain a 
stereotype key (10). Applied to pre- 
dictions, such a key yields a “‘rigid- 
ity” score for the Judge, reflecting 
the degree to which he tends to see 
the Others as typical, and a simi- 
larity-to-stereotype score for the 
Other. Accuracy occurs if the Judge 
follows his stereotype when the Other 
is actually similar to it. For a group 
of Others who strongly resemble a 
Judge’s stereotype, we get high ac- 
curacy when the Judge is “rigid” in 
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the sense of thinking that the Others 
are all like his stereotype. 

6. Manifest stimulus value, the im- 
pression that the Judge forms as to 
the actual personality of the Other, 
can be distinguished from the Judge's 
prediction of how the Other will de- 
scribe himself. For example, an ef- 
feminate boy may have such a mani- 
fest stimulus value that we would de- 
scribe him as really preferring to play 
girls’ games. But we would predict 
that he would choose the boys’ games 
on an interest inventory because he 
could not be expected to admit to 
interests so clearly out of line with his 
sex role. Suppose we ask a group of 
Judges to describe a boy as they 
think he really is and not necessarily 
as they predict he would respond to 
a personality inventory. The result- 
ing manifest stimulus value key 
would be made up of the modal 
Judge's descriptions of the Other. Ap- 
plied to the Other’s self-descriptions 
it yields an “insight-and-frankness"” 
score (22). Applied to the predictions 
of Judges, it yields an “attribu- 
tion-of-insight-and-frankness” score, 
Now, assuming correct Perception of 
the manifest stimulus value, accuracy 
can be fractionated into the Judge’s 
warranted attribution-of-insight-and- 
frankness and warranted attribution- 


of -self-deception-and -lack-of-frank- 
ness. 


Keys Based on Central Tendencies of 
Predictions or Self-Descriptions 


7. Modal prediction keys can take 
two forms: (a) for each item the aver- 
age prediction obtained from a group 
of Judges in predicting the response of 
a single Other, and (b) for each item, 
the average prediction of a single 
Judge for several Others. The first 
of these can break down the accuracy 
score into  typicality-of-prediction 
and predictability-of-the-Other’s-self- 


description. When Judges make 
highly typical predictions for highly 
predictable Others, they are accu- 
rate; when they make atypical predic- 
tions for Others who are missed by 
the majority of the Judges, they will 
also be accurate. The second kind of 
modal prediction key can be used to 
obtain a measure of the Judge’s “im- 
plicit stereotype.” It is involved in 
measures of stereotype accuracy (5). 
8. A key embodying the modal 
self-description of the Others (15), Len 
the manner in which the majority of 
Others describe themselves, yields a 
score for the similarity of the Judge's 
predictions to the modal self-descrip- 
tion and a score for the Other's typi- 
cality. Accuracy then becomes & 
function of these two scores. Tallan 
(23) also averaged self-descriptions 
but used them only as the key for 
scoring the individual member's ac- 
curacy in evaluating group opinion- 


EVALUATING AND USING INTER- 
MEDIARY KEYS 


Bases for Evaluating Intermediary 
Keys 


How can we choose from the many 
Possibilities those intermediary keys 
that have genuine value in the analy 
sis of interpersonal perception? Three 
bases for evaluating intermediary 
keys are as follows: 

1. The internal consistency OV 
items of the score obtained by apply- 
ing the intermediary key to ji 
Judges’ predictions, Unless this aa 
ternal consistency is substantial, t : 
intermediary score cannot be a 
sidered psychologically characteris? 
of the Judge. 

2. Degree of confounding of tw 
more possible intermediary keys- i 
example, the acquiescent choices ° 
an inventory (e.g,, the California 3, 
scale, the Minnesota Teacher At 


oor 


Ee ae 
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tude Inventory) may also tend to be 
the unfavorable choices. If so, we 
can determine which of the two is 
operating only by revising our inven- 
tory. When the confounding is elimi- 
nated, scores obtained with one of the 
keys may lose their internal con- 
sistency. 

3. Degree to which the inter- 
mediary key yields scores logically 
attributable to genuine social interac- 
tion between the Judge and the 
Other, In some situations, a given 
intermediary score may reflect a char- 
acteristic of the Judge which existed 
Prior to his observation of the Other. 
In other situations, the same inter- 
mediary score may reflect genuine 
influences of observation or interac- 
tion, e.g., a favorable reaction to a 
Particular Other. We should dis- 
Unguish between such post- and 
Preinteraction intermediary scores. 
Evidence for the preinteraction char- 
acter of an intermediary score is ob- 
tained from the existence of gen- 
erality of the score over Others, espe- 
cially if the Others are heterogeneous 
with respect to the variable(s). When 
the intermediary scores obtained on 
udges’ predictions for heterogeneous 
Others correlate highly among them- 
selves, the predictions are probably 
autistically determined rather" than 
determined by evidence concerning 
the Others. 


Using Intermediary Keys 


t These considerations can be used 
lee examine possible intermediary 

eys both empirically and logically. 

hose which prove upon examination 
be relevant to the particular prob- 
em at hand should be used in the an- 
alysis of interpersonal perception. 

hus, use of intermediary keys may 
reveal that the measures obtained in 
a given situation are highly loaded 
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with conceptually irrelevant vari- 
ance. 

If so, such irrelevant influences on 
interpersonal perception measures 
can often be reduced by lowering the 
internal consistency of the inter- 
mediary score obtained on the 
Judge’s predictions. For example, 
the information available to the 
Judge can be made more relevant to 
the predictions requested. Then the 
Judge may have less tendency to fall 
back upon autistic response sets. Al- 
though assumed similarity scores, for 
example, would still be obtainable, 
they would no longer be general over 
items or Others. An item format in 
which the response alternatives 
change from one item to the next will 
reduce the likelihood of a positional 


response set (4) in the Judges’ pre- - 


dictions, e.g., reliable individual dif- 
ferences in tendency to choose the 
first of two response alternatives. 
Similarly, a forced-choice format, in 
which the favorability of various 
choices has been equated within each 
item, will reduce the reliable favora- 
bility-of-prediction variance in the 
Judges’ predictions. 

Another method of minimizing the 
influence of general response disposi- 
tions is to give credit for accuracy 
only when the Judge correctly dif- 
ferentiates in his predictions for two 
Others (or between an Other and 
himself) who answered an item dif- 
ferently. A “refined empathy” score 
derived by subtracting the assumed 
similarity score from the accuracy 
score has been suggested (12). Partial 
correlation has also been considered, 
i.e., basing the accuracy score on the 
partial correlation of the Judge’s 
prediction with the Other's self-de- 
scription, holding real similarity con- 
stant. Both of these methods have 
proved on further examination to be 
inadequate (3, 11). We have scored 
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Judges’ predictions on only those 
items on which two Others responded 
differently, giving credit for accuracy 
only when the Judge correctly pre- 
dicted the difference. Corrected split- 
half reliabilities for this score for two 
groups of Judges each predicting for 
a different pair of Others were .65 
and .51. 


How Intermediary Scores Account for 
Nongenerality of Accuracy 


Intermediary keys have shed light 
on the failure to find generality over 
Others in measures of accuracy of 
interpersonal perception. Typically, 
accuracy in judging or predicting for 
one Other has correlated less than .35 
with accuracy in judging or predict- 
ing for a different Other (&g.,.1; 2, 


. 3, 7, 10, 13, 16). At the same time, 


generality over items in accuracy of 
predicting for a single Other has often 
been substantial, i.e., .7 and higher. 
How can we account for such re- 
sults? By use of intermediary keys, 
we have demonstrated (22) that 
standard Others, i.e., fifth-grade boys 
and girls presented to Judges by 
means of sound films, did serve as 
discriminable social stimuli. Specifi- 
cally, a favorability key showed that 
the Judges made consistently and 
significantly more favorable predic- 
tions for some of the children than 
for others. The corrected split-half 
reliabilities of the favorability-of- 
prediction scores ranged from .81 to 
-90, with a median of .86, Now the 
accuracy of a Judge in predicting the 
responses of a particular child de- 
pends in part on the congruence be- 
tween the favorability of his percep- 
tion of the child and the favorability 
of the child’s self-description. Thus, 
the rank order in magnitude of four 
median correlations between favora- 
bility and accuracy was exactly the 
same as the rank order in favorability 
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of the four children’s self-descrip- 
tions. 

It is instructive to examine the 
structure of the relationship between 
the favorability of a single Judge’s 
predictions and of the child’s self- 
description. We can form a triad 
from (a) the prediction protocol of 
the Judge, (b) the self-description 
protocol of the Other, and (c) the 
favorability key. Matching each of 
these against the other yields three 
scores: (ab) accuracy, (ac) favora- 
bility-of-prediction, and (bc) favora- 
bility-of-self-description. When any 
two of these scores or proportions are 
in any degree fixed, the third is pa- 
tially determined. Thus, suppose 
that, from observing the Other, the 
Judge forms an over-all impression 
such that he will predict favorably 
concerning the Other on 75 per cen 
of a set of two-choice items. Further, 
suppose that the Other describes 
himself favorably on 75 per cent O} 
the items. In this case, the “chance 
level of accuracy with two-chor 
items will no longer be 50 per centi 
rather, if no determinants of accuracy 
were operating other than the favorys 
bility “sets,” chance success on 4 ee 
of favorability-loaded items would be 
62.5 per cent. Any key may be su 
stituted for the favorability key 
without altering the logic of the t" 
adic relationship. 

In some of our illustrations, 
noted above, the relationship 
tween the “intermediary key” ae 
the prediction and_self-descriptiO? 
Protocols may be entirely or partia’? 
determined by actual social perceP 
tion of the Other by the Judge. 
two scores yielded by the modal P at 
diction key, for example, are both, ne 
least partially determined by genu! as 
social interaction between the Judg 
and the Other. 

But other keys may yield te 


as 


re- 
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scores of which one may be fixed be- 
fore the Judges see the Other. The 
favorability key discussed earlier 
illustrates such a possibility. That is, 
the favorability of the Other's self- 
description is determined before the 
Judge observes the Other. When the 
intermediary key is the Judge’s prior 
self-description, we have another 
situation where one of the scores (real 
similarity) is not influenced by the 
Judge’s interaction with this Other. 
The Judge’s assumed similarity to 
the Other may also be partially de- 
termined by factors extrinsic to the 
interaction between Judge and Other. 
It has been reported (20) that the 
tendency to assume similarity may 
be general over Others and over var- 
ied test content. Only to the extent 
that the assumed similarity score of 
the Judge is not autistic, but actually 
influenced by the Other, can we con- 
Sider it a result of social interaction. 
_ Finally, it is possible to develop 
Saaremaa keys where meither of 
€ scores yielded by the prediction 
and the self-description may be influ- 
raced by the Judge-Other interac- 
ion. In some cases, as would be true 
ot assumed similarity if it were a 
reponse disposition general over all 
raits and all Others, the scores may 
fe psychological significance al- 
$ Seh they do not reflect genuine 
he erstanding of Others, In other 
ases, „however, the prior response 
So Poons which influence accuracy 
R seem so irrelevant to any social 
it eraction between the Judge and 
ein that they may be consid- 
as nothing more than mathematical 
Ttifact. Thus, the tendency of the 
udge and the Other to choose @ as 
against b, when in doubt as to the 
foe alternative in a two-choice 
m, may yield a triad of this quasi- 
Mathematical form. Here, of course, 
€ intermediary key consists of all 


a responses. It is hard to imagine how 
such tendencies might be related\to 
social interaction between Judge and 
Other, but it is readily apparent that 
they would influence the accuracy of 
the Judges’ predictions. 

How does this discussion of inter- 
mediary keys bear on the problem of 
generality of accuracy in perceiving 
standard persons? We feel that the 
finding of reliability of accuracy for 
one Other without generality over 
Others can be explained by the fortui- 
tous occurrence of these triad rela- 
tionships. Our methods of measuring 
accuracy often make the operation of 
the various global judgments, regard- 
ing favorability and typicality, and 
response dispositions, such as the 
tendency to assume similarity or to 
respond a, very influential in the de- 
termination of accuracy; hence we 
obtain accuracy scores that are reli- 
able over items for a single Other. 
But we know that a conclusive 
demonstration of generality of the 
ability to predict the responses of 
Others requires that the Others be 
dissimilar. By our selection of dis- 
similar Others we make it inevitable 
that the combination of the Judges’ 
attributes and Others’ attributes will 
be different for each Other. Thus we 
prejudice the results against a spuri- 
ous kind of generality due to Judge- 
Other response contingencies, and 
indeed find relatively little ability on 
the part of the Judges to make gen- 
uine differentiations. j 

Crow (7) has demonstrated that 
generality over Others of the Judges’ 
response sets can account for the 
generality over Others of predictive 
accuracy. That is, in Crow’s data, 
accuracy seemed to be general over 
Others to the degree that (a) those 
Others were homogeneous and (b) the 
Judges had response sets that were 
general over the Others. 
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THE INTERMEDIARY KEY IN 
RELATION TO ANALYSIS OF 
VARIANCE 


The intermediary key explicitly 
generalizes many methods of analyz- 
ing interpersonal perception data 
that have already been used. One 
other major approach to analyzing 
interpersonal perception has been 
developed which has not, at first 
glance, involved such keys. This is 
the analysis of variance design de- 
veloped by Cronbach (5). 

The relationship between the in- 
termediary key and analysis of vari- 
ance approach can be seen by refer- 
ring to the k-space model developed 
by Osgood and Suci (18) and by 
Cronbach and Gleser (6). In this 
model, the Judge’s predictions and 
the Other’s self-descriptions are each 
represented as a point in a k-space, 
where k is the number of items or 
scales. Accuracy is then measured 
inversely as the distance, D, between 
the two points. D equals the square 
root of the sum over items or scales 
of the squares of the differences be- 
tween predictions and self-descrip- 
tions. In the analysis of variance 
approach, accuracy is dissected by 
showing that it consists of differences 
between Judges and Others in re. 
sponse set, in scatter over the items, 
in scatter over Others, and so on, 

Each of the arithmetic means in- 
troduced as a reference point from 
which to compute a component of 
accuracy (analogous to a component 
of variance) may be considered an 
intermediary key. For example, the 
mean self-description of all Others 
on all items, from which the response 
set component of accuracy is com- 
puted, may be considered an inter- 
mediary key interposed between pre- 
dictions and self-descriptions. In 
terms of Cronbach's analysis, this 
key could be used to score the aver- 
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age of the Judge's predictions over all 
items and all Others, yielding what is 
then called the elevation component 
of accuracy. As a second example, 
the differences between the nen 
self-description of one Other on a 
items and the mean self-description 
of all Others on all items can be used 
as an intermediary key, this time 
applicable to the predictions for a 
particular Other. Applied to the ps 
responding differences between A 
Judge’s average prediction for a 
Others and for the particular Other, 
this key would yield measures bee 
what Cronbach called differentia 
elevation. A final example is a H 
consisting of the mean response 0 ne 
Others on each item; applied to th 
mean prediction of a Judge for 4 
Others on each item, this would aes 
a measure of the Judge’s stereoly 
accuracy, 

Each intermediary key TEREE 
a reference point around whee 
component of accuracy vanan 
could be analyzed. Not only a a 
Over items, over Others, over Noe i 
or over any two of these may be U Sie 
to dissect accuracy. Rather, the rod 
termediary key formulation ne 
that psychologically or logically edi 
fined reference points can be us ay 
for example, a reference pout Ge 
be constructed consisting of all ai 
orable” or all “adjusted” respons 
Accuracy can then be investigate 
a function of the concomitance ip- 
tween predictions and self dese 
tions along the “favorability , h of 
“adjustment” dimensions. wea o 
the many possibilities the anai 
variance or the user of interme his 
keys chooses will depend upon 
purposes and insight. 


ats 
a 


SUMMARY 


Peas) 
m . 5 
The intermediary key consir 
a protocol, e.g., a set of respon 
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a questionnaire or rating scale. When 
interposed between the Judge’s pre- 
dictions and the Other's self-descrip- 
tions, the intermediary key sheds 
light on the processes involved in 
interpersonal perception. Eight illus- 
trative intermediary keys, drawn 
from recent investigations, are de- 
scribed. Among the considerations 
that may be used in evaluating inter- 
mediary keys are the internal con- 
Sistency of their scores, their degree 
of confounding with other keys, and 
the degree to which the processes re- 
vealed by such keys may be attrib- 
uted to social interaction between 
Judge and Other, as against autistic 
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ful in showing how accuracy in pre- 
dicting Others’ responses can be gen- 
eral over items but not over Others. 
Similarly, they have revealed how 
the processes affecting accuracy in 
interpersonal perception may or may 
not be attributable to genuine per- 
ception of the Other by the Judge. 

The intermediary key approach is 
related to the analysis of variance 
approach in analyzing interpersonal 
perception because the reference 
points in both approaches may be 
considered points in a k-space defined 
by the & items or scales of a ques- 
tionnaire or rating scale. The inter- 
mediary key can be used to define 


it sets or mathematical arti- psychologically as well as mathe- 

acts. matically meaningful reference 
Intermediary keys have been use- points. 
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It is the purpose of this report to 
escribe a developing picture of the 
Structure of human, adult intellect, 
fed seen in terms of factors. Although 
R e Picture is incomplete, presenting 
eet this time seems desirable for two 
Sa The picture now includes 
Bar forty different factors, most of 

f eo are generally unfamiliar. Many 
oils only recently been demon- 
i ed. Enough of the intellectual 
the Ors are known to suggest strongly 
os outlines of a system. The system 
pee) eresting theoretical _implica- 
Cee ae by reason of certain vacan- 
BB disco at appear, it points to still un- 
aha ed factors, somewhat as the 
to ra periodic table has served 

l A icate unknown elements. 
l TARGI writer has emphasized be- 
Bozic 0, 13), psychology and psychol- 
a. i since Binet have taken a much 
{ ages view of human intelli- 
ae We do not need to go into the 
f up j ns here. They can be summed 
that 4 a positive manner by saying 
ture n attempting to fathom the na- 
should p intellect more attention 
Particul e given to the human adult, 
i cularly the superior human adult. 
S to such specimens that we must 
ane n e are to investigate intellectual 
y ies and functions in their great- 

Scope and variety. 
Ae advent of multiple-factor an- 
hd as done something to broaden 
tht our conception of human 
igence, but factor the ry an 


go 


à 
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the results of factor analysis have 
had little effect upon the practices of 
measurement of intelligence. We do 
have a great variety of tests in such 
intelligence scales as the Binet and 
its revisions and in the Wechsler 
scales, to be sure. Too commonly, 
however, a single score is the only 


information utilized, and this single - 


score is usually dominated by vari- 
ance in only one or two factors. 
There is some indication of more gen- 
eral use of part scores, as in connec- 
tion with the Wechsler tests, but each 
of these scores is usually factorially 
complex and its psychological mean- 
ing is largely unknown as well as am- 
biguous. The list of factors that is 
to be presented in this article should 


clearly demonstrate the very limited 


information that a single score can 
give concerning an individual, and 
on the other hand, the rich possi- 
bilities that those factors offer for 
more complete and more meaningful 
assessments of the intellects of per- 
sons. ‘ 

Some seven years ago the writer 
initiated research aimed essentially 
at the study of adult, human intelli- 
gence, in a project on “aptitudes of 
high-level personnel.” j In some re- 


1 Project 150-044, under Contract No6onr- 
23810, with the Office of Naval Research, 
monitored by the Personnel and Training 
Branch. Among those who have made the 
most significant contributions to the project 
are: Raymond M. Berger, aul R. Christen- 
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spects this has been a continuation of 
wartime research in the AAF Avia- 
tion Psychology Research Program 
(21). The project was initiated with 
the conviction that the full scope of 
human intellect had not yet been ex- 
plored, by factor-analysis methods 


or by any other methods. Thinking‘ 


abilities, which have played impor- 
tant roles in some definitions of in- 
telligence, seemed to have been ne- 
glected; particularly abilities having 
to do with productive thinking. Ac- 
cordingly, four areas of thinking were 
selected for study, arbitrarily desig- 
nated as reasoning, creativity, plan- 
ning, and evaluation. While abilities 
belong to the context of individual 
differences, they also imply psycho- 
logical functions of individuals. 
Hence it was thought that the find- 
ings would have much to offer toward 
an understanding of human thinking 
of various kinds, including problem 
solving. 

Space does not permit describing 
in detail the research procedures; 
they have been described in the vari- 
ous technical reports from the apti- 
tudes project (14, 15). It should be 
pointed out, however, that the factor 
analyses were done in a research de- 
sign that includes experimental fea- 
tures. Each investigation starts by 
hypothesizing that certain unitary 
abilities (psychological factors) exist 
and that they have certain proper- 
ties. Psychological tests are then 
selected, adapted, and constructed 
for each hypothesized factor in a 
way that should lead to a “yes” or 
“no” answer from the analysis. The 
results should show that the factor 
hypothesized does or does not exist 


sen, Andrew L. Comrey, Russel F. Green, 
Alfred F. Hertzka, Norman W. Kettner, and 
Robert C. Wilson. I am particularly indebted 
to Christensen and Kettner for reading the 
preliminary draft of this paper, and to Philip 
R. Merrifield, also, for making suggestions. 
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and it does or does not have the proia 
erties suggested. Thus, the kind o 


psychological test is an important 


independent variable, more or les 
under the control of the investigator. 
Certain other experimental variables 
are held relatively constant—the 
testing conditions and certain pap» 
lation features, such as sex, age, edt; 
cation, and motivation. The exam 
inees have been men who were Pre 
viously selected for military aon 
leading to an officer's commis 
and they have been tested un 
ordinary military discipline. A 
In his survey of aptitude facto q 
published in 1951, French (8) ne 
among others, 18 or 19 factors ‘Out 
can be classified as intellectual. 
investigations of thinking abili iiy 
have verified and helped to car a 
many of these factors, besides we 
ducing approximately as many par 
ones. Other recent investigating. 
have also contributed new infor ia 
tion regarding factors. The list ie 
sented here comes from all t 
sources. 


CLASSES OF INTELLECTUAL 


. . hows 
Inspection of the total list 5" 


t 
that the intellectual factors fall PA 
two major groups—thinking ity 
memory factors. The great maJ king 
of them can be regarded as thin e 
factors. Within this group, @ "Vis 
fold division appears—cognition ors: 
covery) factors, production facie 
and evaluation factors.~ The ae b- 
tion group can be significantly ent 
divided into a class of convert di- 
thinking abilities and a class 
vergent-thinking abilities.” 


Cognition (Discovery) Factors do 


= sa a tO 
The cognition factors hav items 
with becoming aware of menta rst 
r acto 
? In the system of the intellectual fac som? 


be described here, the reader will 0 ed i” 
striking similarities to a system dev 


Factor ! 
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or constructs of one kind or another. 
In the tests of these factors, some- 
thing must be comprehended, recog- 
nized, or discovered by the examinee. 
They represent functions on the re- 
ceiving side of behavior sequences. 
The cognition abilities can be dif- 
ferentiated along the lines of two 
major principles. For some time we 
have been aware that thinking fac- 
tors tend to pair off according to the 
material or content used in the tests. 
For each factor of a certain kind 
found in verbal tests there seemed to 
be a mate found in tests composed of 
figures or designs. We found, for ex- 
ample, a factor called eduction of 
perceptual relations, parallel with a 
factor called eduction of conceptual 
relations ; a factor called perceptual 
foresight, parallel to one called con- 
ceptual foresight; and a factor of per- 
Ceptual classification, parallel with 
One of conceptual classification. Only 
recently there has been increasing 
evidence for a third content cate- 
Sory. Factors were found in tests 
whose contents are letters, or equiva- 
lent symbols, where neither per- 
ceived form or figure nor verbal 
Meaning is the basis of operation. 
Factors based upon this type of ma- 
terial have been found, parallel to 
Other factors where the test content 
'S figural or verbal. Thus a third con- 
tent category seems necessary. 
p second major principle by which 
Snition factors may be differenti- 
ya „Psychologically depends upon 
it : kind of thing discovered ; whether 
de a relation, a class, or a pattern, 
Con so on. Thus, for each combina- 
is a content and thing discovered, 
Bee ave a potential factor. The cog- 
on factors can therefore be ar- 
anged in a matrix as shown in Table 
The third and fourth rows seem 
© be complete at the present time. 


d . age 
sue endently by Burt (2). The similarities are 
Pport for the idea that a system does exist. 
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There are vacancies in the other four 
rows. With each factor name are usu- 
ally given two representative tests by 
name to help give the factor opera- 
tional meaning.’ A word or two will 
be said in addition regarding the less 
familiar tests.* 

It should not be surprising to find 
the factor of verbal comprehension, 
the best known, and the dominant 
one in verbal-intelligence tests gen- 
erally, in the first row of the cogni- 
tion factors and in the conceptual 
column. The fact that the cognition 
factors sometimes come in threes 
leads us to look for parallel factors 
for the perceptual and structural 
columns. One candidate for the 
perceptual cell in this row would be 
the well-known factor of perceptual 
speed. This factor has to do with dis- 
criminations of small differences in 
form rather than in awareness of 
total figures, hence it does not quite 
fill the requirement of parallel prop- 
erties with verbal comprehension. A 
better factor for this purpose is the 
one Thurstone (28) called “speed and 
strength of closure,” called figural 
closure in Table 1. For this factor, 
awareness of perceived objects from 
limited cues is the key property. The 
limitation of cues is necessary to 
make the test sufficiently difficult for 
testing purposes. 

There is no known factor that 
seems to belong in the second column 
of the first row of Table 1. In gen- 
eralizing the class of three such fac- 
tors, and in differentiation from 
other classes in Table 1, it is clear 
that those in the first row have to do 
with awareness of items, elements, or 
things. To denote this category 
Spearman’s term “fundament” has 


been adopted. 


3 It should not be inferred that these are the 
only kinds of tests related to the factor. 
4 For more complete descriptions of the tests 


see particularly (14, 17, 21). 
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TABLE 1 WU 
COGNITION (Discovery) FACTORS 
Type of thing Type of content 
known or 
Ei Figural Structural Conceptual 
a ee 
i hension 
Fundaments Figural closure Verbal compre. 
pr Street Gestalt Vocabulary 
Completion 
Mutilated Words 
Classes Perceptual classifica- Verbal classification 
tion , š 
Figure Classification Word Classifica non 
Picture Classification Verbal Classifica ii 
Relations Eduction of percep- | Eduction of struc- | Eduction of concep- 


tual relations 
Figure Analogies 
Figure Matrix 


tural relations 
Seeing Trends II 
Correlate Comple- 
tion IT 


tual relations 
Verbal Analogies 
Word Matrix 


Patterns or systems Spatial orientation 
Spatial Orientation 
Flags, Figures, Cards 
eee 
Problems 


See LSS 


Implications Perceptual foresight 
Competitive Plan- 
ning 


Route Planning 


Eduction of patterns 
Circle Reasoning 
Letter Triangle 
a ae 


General reasoning iat 
Arithmetic Reason! 
Ship Destination 


s 
Sensitivity lo problem: 
Seeing Problems, 
Seeing Deficiencies 


Conceptual foresight. 
Pertinent Questioni 
Alternate Metho 


Penetration | A 
Social Institution: 


Similarities 


Two factors involving ability to 
recognize classes are known, one in 
which the class is formed on the basis 
of figural properties and the other on 
the basis of meanings. It was inter- 
esting that the Picture Classification 
test had more relation to the percep- 
twal-classification factor than to the 
conceptual-classification factor in spite 
of the fact that the things to be classi- 
fied were common objects, the basis 
for whose classification was intended 
to be their meanings. This might 
mean that the perceptual-conceptual 
distinction is a somewhat superficial 
matter, pertaining only to how the 
material is presented. It is possible, 


however, that in 


items 
many of the ite 


l \ n 
in this test the general shapes 5 
sizes and other figural propon ae e, 
an aid in classification. For examP 7 


there are cleaning implements, 
tainers, etc., in some items, W 
similarities of appearance may § 


as clues. 
The difference 
Classification tes 


Classification test is largely 
form of presentation of the pt 
A sample item from the Word pa 
fication test is: “A. horse B. CO doe’ 
man D. flower.’ Which word , 
not belong? In the Verbal Class! 


tion test, two short lists of wor 


+ 


con 
here 
erve 


between the 


t and the in the 


oblems 
Class 


fica- 
ds are 


+ 
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given to establish two classes, eg., 
animals and pieces of furniture. A 
longer list of words is given, each 
one of which must be marked as be- 
longing to one class or the other or 
to neither class. 

Is there likely to be a factor having 
to do with the seeing of classes when 
class membership depends upon struc- 
tural properties? Such a factor would 
be reasonable. We have much to 
learn regarding the scope of struc- 
tural ideas. Thus far, structural fac- 
tors have been found only in tests 
utilizing letters and very simple forms 
such as circles, dashes, and the like. 
One can raise the question whether 
mechanical conceptions, for example, 
belong in this class. There is also the 
question of where figural properties 
end and structural properties begin, 
also of where structural properties 
end and conceptual properties begin. 

e may actually have a continuum 
here, With respect to some cate- 
Sories (including classes, fundaments, 
etc.) there may be a rapid transition 
from figural to conceptual, thus leav- 
ng no basis for a third factor. It is 
likely that the factors in any row of 

able 1 are positively and sometimes 
even substantially correlated. The 
Seneral question of correlations 
among factors will be left for later 
discussion. 
TNIE have a complete triad of fac- 
SN having to do with the seeing of 
a ationships and tests to measure 
i em that are similar except for con- 
ent. The analogies tests are well 
a A matrix test is essentially a 
any dimensional analogies test, ex- 
R Ples of which may be found in the 
I aven Progressive Matrices series. 
N the test Seeing Trends II, we find 
€ following ‘type of item: “anger 
aeons camel dead excite.” The 
Se is to name the letter trend, 
ich, in this item, of course, is that 
€ initial letters are in alphabetical 
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order from “a” to “e.” In the Corre- 
late Completion II test, an illustra- 
tive item reads: “am ma not ton 
tool what word should come 
next? Here it is not word meaning 
that is important but letter se- 
quences. In the Seeing Trends II 
test, likewise, the word meanings are 
of no significance. Presumably, an 
analogies test utilizing letters only 
would do as well as a measure of this 
factor. 

In the row of Table 1 pertaining to 
patterns or systems, we have three 
factors, but they are much more dis- 
parate in kind than usual in this ta- 
ble. The clearest example of an educ- 
tion-of-patterns factor is in the 
middle column. The Circle Reason- 
ing test, adapted from Blakey (9), is 
similar to the Marks test of Thur- 
stone and to the Spatial Reasoning 
test of the AAF (21). In a sequence 
of symbols the examinee must dis- 
cover the principle by which certain 
symbols are marked, then he must 
mark a new set accordingly. In the 
Letter Triangle test, the letters are 
arranged in a different alphabetical 
pattern in each item. The examinee 
must discover the pattern and show 
this by filling a blank with a letter. 

Under the figural category we find 
the factor of spatial orientation, a 
well-known space factor. It is best 
defined as the ability to become 
aware of the spatial order or arrange- 
ment of objects perceived visually. 

Until the system of cognition fac- 


tors was conceived, the writer had | 


thought of spatial orientation as a 
purely perceptual ability rather than 
intellectual.’ Its place in the system 
is regarded as tentative. We may 
yet find another seeing-patterns fac- 
tor in which figural properties play a 
more obvious role than they do in the 

5 A perceptual factor is distinguished from 


‘an intellectual factor by the fact that no sym- 
bolic activity is clearly involved. 
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factor of spatial orientation. In a 
real sense, an orientation within a 
field of perceived objects is a pattern 
or system, where spatial arrange- 
ment, which includes the viewer, is 
the principle. Shapes and sizes of ob- 
jects, which play a more obvious role 
in the case of the other figural fac- 
tors, are of more indirect significance 
in spatial orientation. 

Under the conceptual category we 
find a factor that has been most dif- 
ficult to define. The best conception 
of it is that it represents an ability 
to define or structure problems. It 
has been a most consistent compo- 
nent of arithmetic-reasoning tests, but 
since such tests are psychologically 
complex, it has been difficult to de- 
termine just what aspect of solving 
problems of this type is the signifi- 
cant feature that requires the ability 
called general reasoning. By elimina- 
tion of many rival hypotheses, it 
is now rather clear that the factor 
pertains to the comprehension of the 
structure of a problem, at least of the 
arithmetical variety (19). Since such 
a structure is conceptual, the factor 
logically belongs in the column where 
it is placed in Table 1, The Ship 
Destination test is a special type of 
arithmetical-reasoning test, which 
seems to come closer than any other 
to being a pure measure of the factor. 

In the next row of Table 1, for the 
discovery of problems, there is only 
one factor—sensitivity to problems, 
which is in the conceptual column, 
The appearance of this factor parallel 
to general reasoning in the row pre- 
ceding, emphasizes the well-known 
observation that it is one thing to be 
aware that a problem exists and an- 
other thing to be aware of the nature 
of the problem. The titles of the tests 
are quite descriptive. A sample item 
from the test Seeing Problems asks 
the examinee to list as many as five 
problems in connection with a com- 
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mon object like a candle. The tn 
Seeing Deficiencies presents in eac 


item the general plan for solving a ` 


given problem, but the plan raises 
some new problems. What are those 
problems? z lel 
Whether we shall ever find paralle 
factors for seeing problems or de 
ficiencies of figural and Sm 
types remains to be seen. Problemi 
of a figural type are faced in aesti 
pursuits such as painting and arc 
tecture. Problems of a ictus 
type might be faced in eon 
with spelling or the development e 
language. Tests pertaining tO 1O 
seeing of problems have thus far A 
vided no figural or structural ae 
for problems. It should be pape 
easy to test the hypothesis that nel 
factors exist. If they do exist, : " 
possible implications for every 
performance need further study- aag 
In the investigation of PA 
abilities (14, 15), two paralle fore- 
tors were found—perceptual ‘here 
sight and conceptual foresight sitive 
one was expected. The Compe! ie 
Planning test was originally ang: 
by the AAF psychologists as 2 “Fy 
of foresight and planning (21): at 
requires the examinee to imagiPy sing 
he is playing the game of comp Jays 
Squares by drawing lines. He Fach 
for the two opponents and eet 
item he has to tell the maxi! yen 
number of squares each oppor be 
can complete under the rules ©! an 
game. The Route Planning test, T e 
other AAF product, is a type of a 
problem. The examinee EA 
which of alternative points wi fro 
to be passed through in going Ir 
the starting point to the goa are 
both tests, perceived layouts 
used. : pre- 
The test Pertinent Questions Feci- 
sents in each item a need for ked to 
sion and the examinee is @5 sider 
state what facts he should co” 


\ 
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in reaching a decision. For example, 
a new graduate is offered positions 
in two different cities. What should 
be the deciding considerations? In 
the Alternate Methods test, a practi- 
cal problem is given, with available 
objects that may be used. The ex- 
aminee is to give several alternative 
solutions that he considers most ade- 
quate, 
Porteus has maintained that his 
Series of maze tests measure fore- 
sight. He can well claim support 
from the factor-analysis results just 
Mentioned. The type of foresight 
Measured by maze tests, however, is 
of a concrete variety. This ability 
May be important for the architect, 
the engineer, and the industrial-lay- 
Cut planner. It may not be found re- 
lated to the abstract type of plan- 
ning that we find in the political 
Strategist and the policy maker. So 
ar as our results go, the maze test 
should by no means be offered as a 
test of general intelligence. This 
Statement might need modification, 
Owever, after the maze test is factor 
analyzed in a population of lower 
Seneral intellectual level (where gen- 
ral intelligence is defined operation- 
ally as an average of all intellectual 
abilities), In a population of “high- 
evel personnel,” we can say that a 
maze test measures most strongly the 
ae of perceptual foresight and, 
Neidentally, to some degree the 
Actors of visualization and adaptive 
flexibility (18). 

e appearance of a factor called 
benetration in the last column of 
k ible 1, along with conceptual fore- 

"ght, calls for comment. A factor of 
Penetration was hypothesized in the 
he analysis of creative abilities and 

aS not found (31). An unidentified 
actor found there might well have 
fen penetration. A factor has been 
> identified in a more recent analysis 
at emphasized creative ability tests 
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(20). It is strongly loaded on a test 
called Social Institutions, which asks 
what is wrong with well-known in- 
stitutions such as tipping. It was de- 
signed as a test of sensitivity to prob- 
lems, and it has consistently had a 
loading on that factor. In the first 
creativity analysis, two scores were 
based upon this test; one being the 
total number of low-quality or obvi- 
ous defects and the other was the 
total number of high-quality or ‘‘pen- 
etrating’’ defects—defects that can 
be seen only by the far-sighted per- 
son. As a matter of fact, the two 
scores had much to do with effect- 
ing a separation of the seeing-prob- 
lems tests into two groups, one of 
which might have been identified as 
the penetration factor. 

It is quite possible that the factor 
of penetration and the factor of con- 
ceptual foresight are one and the same. 
They came out in two different an- 
alyses that had no crucial tests in 
common. It would be a good hy- 
pothesis that they are identical and 
a good prediction would be that if the 
four tests listed in Table 1 were an- 
alyzed in the same battery they 
would define a single factor, not two. 

There is the apparent possibility 
for the existence of a foresight factor 
involving structural arrangements, 
but the scope and usefulness of such 
a factor would seem to be question- 


able. 


Production Factors—Convergent Think- 
ing 

The second large group of think- 
ing factors has to do with the produc- 
tion of some end result. After one 
has comprehended the situation, or 
the significant aspects of it at the 
moment, usually something needs to 
be done to it or about it. In the an- 
alogies test, for example, having seen 
the relation between the first pair of 
elements of an item we must then 
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find a correlate to complete another 
pair. Having understood a problem, 
we must take further steps to solve 
it. 

Like the cognition factors, the 
production factors show some prom- 
ise of falling under the rubrics of 
figural, structural, and conceptual, 
but here the picture is less complete. 
The kinds of things produced are 
‘more numerous than the kinds dis- 
covered. There are no identities of 
things in the two lists, but there are 
a few parallels or relationships. For 
example, corresponding to the com- 

_ prehension of words, there are factors 
concerned with the production of 
words; corresponding to the discov- 

ery of classes there is the act of nam- 
‘ing; corresponding to the discovery 
of relations there is the production of 

correlates; and corresponding to the 
discovery of systems there is the pro- 

duction of order. But with these 

few instances, the connections and 
parallels seem to end. 

It was announced earlier that the 
production factors fall into 

| groups—convergent-thinking 


two 
fac- 
divergent-thinking factors. 
Such a distinction seems not to have 
been’ emphasized in Prior literature 
on thinking. In the case of some of 
the production factors, the distinc- 
tion is not complete, but in most Cases 
it is striking, 
In convergent thinking, there is 
usually one conclusion or answer that 
is regarded as unique, and thinking 
is channeled or controlled in the di- 
rection of that answer. In tests of 
the convergent-thinking factors, there 
zis one keyed answer to each item. 
Multiple-choice tests are well adapted 
to the measurement of these abilities. 
In divergent thinking, on the other 
hand, there is much searching or go- 
ing off in various directions, This is 
most clearly seen when there is no 
unique conclusion. For the measure- 


4 


tors and 
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ment of such abilities, completion 
tests are almost a necessity. The 
distinction is not so clear in some 
problem-solving tests, in which there 
must be and usually is some diver- 
gent thinking or search as well as ei 
mate convergence toward the so a 
tion. But the processes are logically 
and operationally separable, even 1n 
such activities. 3 
In Table 2 we have those produc 
tion factors identified as dealing ye 
convergent thinking. There are a 
potential triads of factors, depen JA 
upon the kind of result produce 


or 

names, correlates, orders, changes 
i i two: cea 
unique conclusions. In qin 


structural-type tests have figure 
factors, thus a ci ae ma 
has been again adopted. . 
In the first row are factors hee 
to do with the production of nea 
The two factors there are again € tie 
trasted in terms of the conan 
abstract dichotomy. They di to 
also, by the fact that the one ha 
do with the naming of part ca 
while the other has to do with ie 
naming of classes. French (8) hae 
factor. of naming, which has , tine 
called object naming here to dene 
guish it from the factor of abstract js- 
naming, which was just recently test 
covered. The appearance of @ pric 
of Color Naming under the ru the 
of “figural” calls for broadening ize 
conception of this class to poco 
color as a figural property. eb sic: 
objects distinguished for ponte, not 
tural properties are evident iz cal 
very common. If good examp eit 
be found, we may find a third Tot 
ing factor. In the name of the 14% p- 
of abstraction naming, the ter™ com 
straction” may prove to be to? ative 
prehensive. The two illustr iat 
tests mentioned might suggest am- 
the ability is restricted to the y that 
ing of classes. The results sho since 
it is actually broader than that, 


trix 


yi 
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TABLE 2 
PRODUCTION Factors—ConVERGENT THINKING 
‘Typetofiresule Type of Content 
roduced P 
P! e Figural Structural Conceptual 
Names Object naming Abstraction naming 
Form Naming Picture-Group Nam- 
n ing 
Color Naming Word-Group Naming 
Correlates Eduction of correlates 
Correlate Completion 
Figure Analogies Completion 
Orders Ordering 
Picture Arrangement 
Sentence Order 
Changes Visualization Redefinition 5 
; Spatial Visualization Gestalt Transforma: 
tion 
Punched Holes Object Synthesis 
Unique conclusions Symbol substitution Numerical facility Symbol manipulation 
Sign Changes Numerical Opera- | Symbol Manipulation 
tions a 
Form Reasoning Sign Changes II 
Oy 


It pertains to the naming of relations 
also, in other tests. 

With three factors having to do 
with the seeing of relationships, we 
might well expect three correspond- 
ing factors concerned with the educ- 
at of correlates. As a matter of 
act, the project has for some time 
anticipated at least two such factors, 
Perceptual and conceptual, and has 
signed tests that were expected to 
feet the expected separation. To 
ao date, only one eduction-of- 
a Trelates factor has been clearly in- 
Cated, and both figural and struc- 
ural tests have loadings on it. The 
aoe Analogies Completion test, 
Bue we hoped would help to dis- 
Gene a conceptual-correlates fac- 
$ turned out to be a test of expres- 
Fe nal fluency. Evidently the educ- 
10n-of-correlates aspect of the test 
Was made ‘so easy that little variance 
n this ability, if it is separate, was 


i 


manifested. On the other hand, hav- 
ing educed the correlate, thinking of 
the needed word provided the chief 
basis for individual differences in 

scores, and hence the loading on ex- 

pressional fluency. It can be pre- 

dicted that with the appropriate 
tests, three eduction-of-correlates fac- 
tors will become evident. Because of 
the difficulty of separating them, it 

can be predicted that the intercor- 

relations of these three factors will be | 
found to be substantial. 

In the investigation of planning 
abilities it was hypothesized that 
there would be an ability to see or, to 
appreciate order or the lack of it, as 
a feature of preparation for planning. 
It was also hypothesized that there 
would be an ability to produce order 
among objects, ideas, or events, in 
the production of a plan. A single 
ordering factor was found. Since the 
three tests designed to measure sensi- 
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tivity to order had low and insignifi- 
cant loadings on the factor, while the 
three designed to measure the pro- 
duction of order had significant and 
even substantial loadings, the factor 
seems to belong among the produc- 
tion factors. The Picture Arrange- 
ment test presents a four-part car- 
toon strip in which the parts are out 
of correct temporal order. The ex- 
aminee has to state the best order. 
The Sentence Order test presents in 
each item three sentences, each stat- 
ing an event, the examinee being 
told to rearrange them. 

It remains to be seen whether 
ordering in terms of figural and 
structural properties will call for 
additional ordering factors to help 
complete the matrix of Table 2. 
Figural ordering may be a significant 
aspect of pictorial art. It is not so 
easy to see where a structural order- 
ing would be of consequence. 

In the next row of Table 2 we find 
the factor of visualization, which has 
been known for some time, and the 
factor of redefinition, which was found 
originally in the first creativity an- 
alysis (31). The thing produced in 
both instances is some kind of change 
or rearrangement or shift. The Spa- 
tial Visualization test is Part VI of 
the Guilford-Zimmerman Aptitude 
Survey. In each item certain move- 
ments of a pictured alarm clock are 
indicated and the examinee is to 
select the view that would be seen af- 
ter the movements. The Thurstone 
Punched Holes test’ shows a paper 
being folded and a hole or holes then 
cut out. The examinee is to tell how 
the paper would look after unfolding, 

The redefinition factor involves 
shifts of meaning or use of objects or 
parts of objects. The test Gestalt 

Transformation asks such questions 
as: With which of the following ob- 
jects could one best start a fire: A. 
fountain pen, B. onion, C. pocket 
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watch, D. light bulb, E. bowling ball? 
The keyed answer is C, since the 
crystal can be transformed from a 
face cover to a condensing lens. The 
Object Synthesis test asks such ques- 
tions as: Given pliers and a shoe 
string, what could you make? A goo 

answer would be “pendulum a 
“plumb bob.” In either case the o š 
jects play new roles in the combina 
tion. 

The last row of factors in Table 2 
presents an interesting triad. | i 
though there are one or two question’ 
that can be raised about their gd 
ment, to be mentioned later, 1t o 
quite clear that they all involve Hpo 
ous operations with symbols leading 
to unique conclusions. The factor 
numerical facility is the very rt 
known ability to operate with n" z 
bers, where both speed and acur 
are significant. The two new facto i 
symbol substitution and symbol oo 
ipulation, were regarded as oer ais 
tor until recently. In one ana cae 
the factor looked like a substitut! i 
ability and in another analys y, 
looked like a manipulation abi a0 
In a recent analysis (20) the 
were found to be separate. 

To distinguish these factor 
must consider the different kind | 
tests that represent the tW0. ojd 
Sign Changes, the examinee 15 ter” 
before each block of items what !” 


we 


© cigs 
changes to make in algebraic “er 
e.g., “replace — with X” am the 
place + with —.” He applies 


sane 
new rules to several simple equatin 
such as “3—6=?" and OGF ia 
In the Form Reasoning test: es 
tions are stated in the form, = “ne 
binations of simple geometric 1) at- 
Some definitions are first ae rms: 
ing that a combination of two 10" pe- 


placed by another single ons 2 
square. With these substitu’. ins 


+ F! inat! 
single forms for pairs, combine 
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greater than pairs must be reduced to 
single symbols, taking each pair in 
turn. 

It is difficult to accept fully the 
placement of symbol substitution in 
the figural column. If all tests loaded 
on it were like Form Reasoning, 
where the rigorous definitions and 
operations are all in terms of figures, 
the placement would be quite rea- 
sonable. But certain features of the 
Sign Changes test suggest that it is 
not figural properties, as such, that 
are important. They may serve 
merely to identify the symbols. In 
the Sign Changes test it is the opera- 
tion that the symbol stands for that 
is important. 

The Sign Changes test was origi- 
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nally designed as a flexibility test; 
the Form Reasoning test was not. In 
both, the readiness to switch the 
meaning or significance of symbols is 
the obvious peculiar feature. Per- 
haps the emphasis should be placed 
on the word “switch.” It may be 
that this factor will eventually be 
placed in the family of flexibility 
factors, which appears in Table 3. 
There is no evidence against the hy- 
pothesis that symbol substitution is 
the same as the present factor of 
adaptive flexibility, represented par- 
ticularly by the Match Problems 
test. As a matter of fact, Sign 
Changes had a significant loading on 
adaptive flexibility in the creativity 
analysis (31). Form Reasoning has 


TABLE 3 
PRODUCTION FACTORS— DIVERGENT THINKING 


Type of result 


Type of Content 


produced Figural , Structural Conceptual 
Words Word fluency Associational fluency 
Prefixes Controlled Associa- 
tions II 
Anagrams Associations IIT 
Ideas Ideational fluency 
Plot Titles 
Consequences 
EN E | 
Expressions Expressional fluency 
Vocabulary Comple- 
tion 
Similes 
Shifts Flexibility of closure Adaptive flexibility Spontaneous flexibility 
Hidden Pictures Match Problems Brick Uses 
Gottschaldt A Planning Air Ma- | U nusual Uses 
neuvers 
Novel res Originality 
iij Plot Titles 
(cleverness) 


— 


Symbol Production 


Details Elaboration* 
Planning Elaboration 


Figure Production 


Elaboration* ; 
Planning Elaboration 


Figure Production 


* At present regarded as the same factor, but future results may indicate two separate factors. 
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never had an opportunity to show 
such a loading. 

Defining the factor of symbol man- 
ipulation are the two tests Symbol 
Manipulation and Sign Changes II. 
Symbol Manipulation provides some 
simply defined symbols, such as: E 
means equal to; NG means not 
greater than. Each item then pro- 
vides a statement such as: xEy and 
yNGz; which of the following state- 
ments can logically be made: xSz, 
xNGz, etc. This test was designed 
originally for the factor of logical 
evaluation (see Table 4), and has 
usually shown some relationship to 
that factor, but it also helps to define 
the factor of symbol manipulation. 

The test Sign Changes II presents 
simple “equations” such as 1+2 
=4X1, the two sides of which are 

not actually equal as the statement 

stands. The examinee is to say what 
interchange of algebraic signs will 
make the equation correct. In the 
illustration just given, if X and — 
are interchanged the equation will 
balance. 


From these two tests alone, it is 
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7 
not easy to see exactly what kind of 
ability is involved in common. One 
clue may be that both tests involve 
equations. A third test with a Sig- 
nificant loading in one analysis 1s & 
number-series test. This test does 
not involve equations. In one an- 
alysis the numerical-facility fac 
was distinct from symbol manipu 
tion, consequently we cannot ident! 7 
the latter with the former. Fura 
intensive work is obviously neces 
in the area of these factors. Abiit 
that may be of some significance m 
success in mathematics may 
found here. 


. ink- 
Production Factors—Divergent Thin 
ing 


` 

The divergent-thinking factors are 
arranged in a matrix in Tabla 
with the three column categories oe 
have now become familiar. ited, 
there are more vacancies to be fi ple 
if the system is indeed as applica n 
as it promises to be. 

In the first three rows of the hats 
we find the four well-established y 
ency factors. In the first row are 


TABLE 4 
EVALUATION FACTORS 


Figural 
Eee | 
(Perceptual evaluation)* 
Ratio Estimation 
4 Figure Estimation 


Length estimation 
Pattern Assembly 
Shorter Path 


Structural 
ae WI Neds eee e a e 


Conceptual 


Logical evaluation 
Logical Reasoning 
Inferences 


Experiential evaluation 
Unusual Details 


Judgment 
Practical Judgment 
Practical Estimation 


Speed of judgment 

Color-Form Sort Time i 

Social Judgments Time i 
AEri 


j . . ENa r 
* Probably a composite of factors, including length estimation, 4 


tay } HeT 
i 


| 


THE STRUCTURE OF INTELLECT 


two fluency factors having to do with 
the production of single words. In 
the case of the factor of word fluency, 
meaning is of no importance. The 
usual tests of this factor merely spe- 
cify that the words shall begin or end 
with a specified letter, prefix, or suf- 
fix. Only such structural require- 
ments are to be met. The examinee 
need not even know the meanings of 
the words he gives. In the case of 
associational fluency, however, mean- 
ing is an essential requirement. The 
words given must be synonyms, as 
in Controlled Associations II, or 
must be related in some meaningful 
Way to stimulus words or ideas. In 
Controlled Associations II, the ex- 
aminee gives as many as three syn- 
onyms to each stimulus word. In 
Associations III, two words are 
Siven, differing in meaning, and the 
examinee must give one word that isa 
Synonym to both. For example, the 
word “lie” would be given as a syno- 
nym to both “recline” and “deceive.” 
It does not seem very likely that 
an ability will be found for the first 
cell in Row 1 of the table. This would 
call for the production of words satis- 
fying specified figural requirements. 
Yet, tasks can be thought of to meet 
this case, for example, the writing of 
headlines, the production of esthetic 
effects with words, and so on. It does 
Not seem likely, however, that there 
Should have developed in human 
makeup a unitary ability of this kind. 
he second row of the table offers 
Some interesting possibilities. ~The 
Speed of calling up ideas expressible 
in verbal form can be tested by dif- 
ferent kinds of tasks. The two ex- 
amples of tests given were designed 
for the study of creativity. The Plot 
Titles test of fluency is scored by the 
total number of low-quality titles 
that can be suggested for a short 
Story plot inva ; Ge, The Con- 
sequences | test ‘is scored Sganilcly, 
RY X t 4 x! NY helt 
j 7 


4.2 


ahi CY Pl lira 
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but the responses are consequences 
foreseen as a result of some drastic 
change, such as everyone going blind. 

It can well be questioned whether 
fluency of verbal responses of such 
kinds is strongly related to fluency 
of ideas of a mechanical, or musical, 
or pictorial kind. Fluency tests have 
been commonly cast in verbal form. 
Fluency in the production of figures 
and fluency in the production of 
things distinguished by their struc- 
tural properties may well be separate 
factors, both distinct from the idea- 
tional-fluency factor now known. The 
exploration of such possibilities would 
seem to be a fruitful route to take in 
the investigation of creativity. 

The separateness of the factor ex- 
pressional fluency from ideational 
fluency indicates that the ability to 
have ideas and the ability to put 
them into words are different things” 
Since the examinee must state ver- 
bally his ideas in tests of ideational 
fluency, it might be supposed that 
his ability to express himself is in- 
cluded or is also being tested. But 
apparently in such a test the expres- 
sional problem is not a serious one. 
We present other tests in which the’ 
idea is given and the examinee ie 
put it into words, usually in mom 
than one way. The expressional 
problem is then more difficult, the 
test giving us variance in the expres- 
sional factor. In the Vocabulary 
Completion test, a stimulus word is 
used in a brief context, enough to 
indicate its meaning, and the ex- 
aminee has to give the word. In th m 
Similes test, the examinee must givi 
more than one completion to a simile. 
In a Verbal Analogies “Completion 
test, which was designed to measure — 
another factor, we found that the 
leading variance is in the expres- 
sional-fluency factor. - Aj 

ae e LI triad in Table 


3 is a set of flexibility factors, the- 
ae ae tea | 


SS ae 
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best-known of which is adaptive 
flexibility. The three factors involved 
are not clearly parallel in all respects. 
They have in common the feature 
that sudden shifts of activity occur— 
shift of organization of a figure, shift 
of set or approach to a problem, or 
shift of category of responses, re- 
spectively. Thurstone discovered the 
flexibility-of-closure factor in his an- 
alysis of perception (28) and found 
that the factor had relations indicat- 
ing its intellectual importance. 

The most consistently representa- 
tive test of the factor of adaptive flexi- 
bility is the Match Problems test. 
This test is based upon the old, fa- 
miliar puzzle or game of removing a 
-specified number of match sticks in 
order to leave a specified number of 
squares. In order to measure flexi- 
bility, the problem changes dras- 
tically from one item to the next, re- 
quiring very unusual solutions; solu- 
tions such as the average person 
would not expect. For example, at 
first the examinee is led to expect 
that the remaining squares will be 
of the same size, but 
item in which they must be of un- 
equal size. Another item requires 
that a smaller Square be left within a 
larger one, and so on. 
In an unpublished study, a test 
-involving Gottschaldt figures came 
out as strongly loaded on adaptive 
flexibility as did Match Problems, 
In the same analysis, a test of In- 
sight Puzzles also had a similar load- 
ing. Thus, in this case, a perceptual, 
a structural, and a conceptual test 
had strong loadings on the same fac- 
tor. There is therefore the Possibility 
that flexibility of closure and adaptive 

flexibility are one and the same f 


; actor 
and that this factor cuts across all 
three columns of the matrix. In an 


analysis where perceptual, structural, 
and conceptual flexibility tests are 
all liberally represented, however, it 


can be predicted that three factors 
will be found. If so, they are prob- 
ably substantially intercorrelated. 
If there are three such factors, the 
factor of spontaneous flexibility would 
have to be moved to another row o 
the matrix to be replaced by a con- 
ceptual-adaptive-flexibility factor. i 
The factor of spontaneous flexi- 
bility has appeared persistently hat 
never with great strength or A 
bility. The Brick Uses test, flexi- 
bility score, is the best clue to its i 
ture. This score is the number = 
runs of responses. The examinee 7 
told to name all the uses he can thin = 
of for a common brick, in eight min- 
utes. A “run” of responses is a se- 
quence of uses all of the same class, 
such as the use of bricks as bale 
material or as missiles, and so on. 2 
test Unusual Uses calls for = 
several unconventional uses for ae 
of a number of objects, the num iv 
given being the score. Since es 
verbal tests of this factor have bee? 
analyzed, nothing can be said Tegar 
ing the possibility that there E 
parallel factors involving figural a! 
structural contents. to 
It is of some interest to atte 
relate spontaneous flexibility to oie 
concepts in psychology. Essentia j 
it appears to be a disposition to avo : 
repeating one’s self. This suggests 
relation to Thorndike’s concept t 
refractory phase or to Hull’s contene 
of reaction inhibition. A iypotie 
to be tested would be that ae E 
signed to measure individual se 
ences in tendency to show refractie 
phase of the Thorndikian type i to 
tests to show degree of tendency a 
reactive inhibition indicate the ma 
factor as do tests of spontane 
flexibility, that 
The results continue to show ple 
originality is operationally denta 
as the likelihood of giving uE 
ventional, clever, or remotely aS 


= 
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ciated responses to test items (30). 
It is measurable in terms of number 
of clever titles given to story plots, 
clever “punch lines” for cartoons, 
remote consequences to events, and 
idiosyncratic word associations. In 
two analyses there has been oppor- 
tunity for a cleverness factor to sep- 
arate off from the rest, but this did 
not occur. While the factor thus 
seems to be a rather broad one, it 
may well be asked whether such a 
factor, measured only by means of 
verbal tests, is significantly related 
to original production in nonverbal 
activities such as graphic arts, music, 
or inventive engineering. 

We have had only one originality 
test that is at least partly nonverbal 
—the Symbol Production test. This 
test was designed for another pur- 
Pose, namely to test the hypothesis 
that there is a separate ability to 
Symbolize ideas in terms of simple 
line drawings. Each item presents a 
Statement, such as “ring the bell,” 
of which the two italicized words are 


to be represented by two symbols. 


The score is the number of nouns and 
verbs symbolized in the testing time. 

he test is not entirely nonverbal, of 
Course, although the thing produced 
iS figural. There was a second test 
(Line Drawing) requiring the pro- 
duction of line symbols for given 
adjectives in the same battery with 
the Symbol Production test. These 
two tests might have given rise to a 
Separate factor, but they did not. 

€vertheless, the writer is of the 
Pinion that the problem of whether 
there are originality factors peculiar 
to nonverbal areas is still an open 
One. 

The elaboration factor is an ability 
to provide details working toward 
completion, when a part or an outline 
'S given. The test Planning Elabora- 
tion presents the bare outline of a 
Plan to which details must be added 


to make it effective. In the Figure 
Production test, a simple line is given, 
to which the examinee is asked to add 
lines to complete an object. The 
score depends upon the amount of 
detail added. 

Here we have a clearly verbal test 
and a clearly figural test (although a 
meaningful object is usually pro- 
duced) both with relation to the same 
factor. There is still the possibility 
that there are two (or three) elabora- 
tion factors, distinguished in terms 
of content, with enough relationship 
between them to cause the factors to 
appear to be one. It will take a new 
analysis in which at least three good 
figure-elaboration tests and three 
good verbal-elaboration tests (not to 
forget a triad of structural-elabora- 
tion tests, also) should be included to 
determine how many elaboration 
factors there are. 

Considering the factors in the di- 
vergent-thinking category together, 
it is obvious that the freedom to 
change direction of thinking varies 
considerably from one instance to 
another. Different degrees of situa- 
tion-imposed restriction are involved. 
But generally, within whatever lim- 
its that are imposed by external re- 
strictions, the need for rejecting or 
superseding a response and for try- 
ing or producing a new one is the 
common element in this group of fac- 
tors. There is also a difference in the 
amount of self-imposed restriction or 
freedom. This depends upon the in- 
dividual rather than upon the situa- 
tion. It is largely in this source of 
variation that we find the divergent- 
thinking factors. 


Evaluation Factors 


Evaluation factors have to do with 
decisions concerning the goodness, 
suitability, or effectiveness of the 
results of thinking. After a discovery 
is made, after a product is achieved, 


2 
is it correct, is it the best that we can 
do, will it work? This calls for a 
judgmental step of some kind. git 
was our hypothesis in the project 
that the ability to make such deci- 
sions will depend upon the area 
within which the thinking takes 
place and the criteria on which the 
decision is based. The results indi- 
cate several evaluation factors. They 
have been placed in the customary 
three-column matrix in Table 4, in 
spite of the fact that none have been 
found to fit the structural column. 
In this group of factors there is no 
good way of distinguishing rows. 
The domain of evaluation factors 
has been less well explored than the 
other intellectual domains. 

The least that can be said is that 
the perceptual-conceptual dichotomy 
applies in this area of abilities. Al- 
though our analysis showed only one 
factor applying to judgments of 
figural material, it is likely that in 
this subarea of evaluation alone there 
are a number of judgment factors. 
For this reason the factor of per- 
ceptual evaluation has been placed in 
Parentheses in Table 4. For ex- 
ample, a more restricted factor of 
length estimation has been found (21). 
The search for such factors carries us 

_ over into the whole realm of psycho- 
physical judgment. It would be dif- 
ficult to say whether factors of this 
kind belong under the general head- 
ing of thinking or under the heading 
of perception. In view of the known 
complexity of psychophysical judg- 
ments in general, their place in the 

intellectual group can be defended. 

The best established evaluation 
factor is that of logical evaluatiòn. 
This is defined as the ability to 
judge the soundness of conclusions 
where logical consistency is the cri- 
terion. The factor has’ sometimes 
been called “deduction,” with the 
belief that it is the ability to draw 


J. P. GUILFORD 


conclusions logically consistent with 
premises. If this were the case, the 
factor would belong with the produc- 
tion-factors group. Most tests 1n 
which the factor has been found to be 
a component are of the true-false i 
multiple-choice form, in which ei 
examinee is given conclusions; F 
need not produce them. It is di : 
cult to say whether he actually E 
produce them for himself first t a 
find them among the answers P" x 
vided. But whether he does this o 
not, he must necessarily maketa 
judgment as to the correctness © a 
answer—his own answer or the Mek 
given him. Even in a comp a 
test, this step would be necessa Xi 
It seems preferable, therefore, to “ 
the factor logical evaluation ar 
list it among the evaluation faci 
It was hypothesized that EA 
would be a factor in which eva p 
tion is made on the basis of past a ; 
perience. Such a factor was fore 
and it is represented best by the ie 
of Unusual Details. In this ter tee 
examinee is asked essentially hich 
is wrong with this picture,” in re 
there are two features that are ne 
gruous or inconsistent with com ih 
experience. In defining this fac e 
whether the emphasis shoul ee 
placed upon the supply of pas lize 
perience or upon an ability to Y 
that experience is not known. 
The factor called judgment 1s a 
with some hesitation. It waa N 
repeatedly, but rather weak 2y re- 
AAF research (21). It is best "ical 
sented by a test in which a piei ral 
difficulty was described and r 
alternative solutions 


listed 


é 1 
sidered? In common terminologs, 
the ability might be recognizes ap 
wisdom or common sense. In n evi 
titudes-project research, there 7 ac 
dence that this AAF judgmen piled 


tor may be the same as the on 


‘of perceptual abilities (28). 
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redefinition. If this is the case, it is 
not easy to say where to place the 
emphasis in defining the factor. 

The factor speed of judgment was 
found by Thurstone in his analysis 
The 
speed with which the examinee com- 
pletes the sorting of objects accord- 
ing to color or form and the speed 
with which he checks traits that ap- 
ply to himself are both measures of 
the factor. It is thus shown as cut- 
ting across the three content cate- 
gories. It might well be classed as a 
temperament trait rather than an 


_ ability. 


Memory Factors 


There is little doubt about the 
grouping of the remaining factors 
under the heading of memory factors. 
Collecting all such factors from vari- 
ous sources, we find that seven qual- 
ify for this category. A recent an- 
alysis by Kelley (27) has done much 
to verify and complete the picture 
for this group. It is possible to or- 
Sanize these factors in the three 
columns of the now familiar cate- 
gories as to content, and in three rows 
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as to the kind of thing or aspect in- 
volved (see Table 5). The titles of 
the tests representing each factor are 
usually quite descriptive. 

The best-known of the memory 
factors is rote memory; the ability to 
learn and to remember things asso- 
ciated, where meaning is of little or 
no importance. In the AAF research 
this factor was called ‘‘associative 
memory” for the reason that paired- 
associate learning was typical of the 
tests of it. There was a need, also, 
of distinguishing it from the factor 
of visual memory, where sheer con- 
tent is important rather than associa- 
tive connections between contents. 
Since Kelley (27) has demonstrated 
another associative-memory factor 
in the form of meaningful memory, 
however, it seems best to return to 
the name of rote memory. The place- 
ment of both in an associative row 
of the matrix indicates their common 
associative property. The vacancy 
under the figural heading in this row 
calls for the hypothesis that there is 
an undiscovered factor pertaining to 
the learning of associative connec- 
tions between figural contents. 


TABLE 5 
A Matrix or Memory FACTORS 


Thing or aspect Type of Content 
remembered Figural Structural Conceptual 
Associative connec- Rote memory Meaningful memory 
tions Word-Number Sentence Completion 
Color-Word Related Words 
Content Visual memory Memory for ideas 
Reproduction of De- Memory for Ideas 
signs a. 
Map Memory Limericks 
Auditory memory 
Musical memory 
Rhythm 
Ee en tea | 
Span Memory span Integration T 
ka Letter Span Signal Interpretation 
Digit Span Combat Planes 
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The factor of visual memory has 
been known for some time (21). The 
factor may be regarded as a rather 
photographic-memory ability. Some 
individuals are recognized as stand- 
ing out in this respect, for example 
certain police officers who remember 
faces and motor-vehicle license num- 
bers remarkably well. In tests, the 
evidence of remembering of this type 
may be in the form of reproductions 
(Reproduction of Designs test), or 
recognition (an AAF Map Memory 
test), or verbal descriptions (an- 
other AAF Map Memory test). 

The listing of a factor with the 
name of auditory memory represents 
in part the writer’s somewhat risky 
hypothesis. It is based upon a factor 
found by Karlin (26) in tests of musi- 
cal memory (for melody and rhythm), 

French (4) called it “musical mem- 
ory,” which is the cautious thing to 
do. The name “auditory memory” 
used here implies some confidence in 
the prediction that w 
cal auditory-memory tests are in- 
cluded with musical-memory tests 
in the same analysis, the same factor 
will apply to both. 

AAF research results hinted at the 
existence of a content-memory or 
substance-memory factor but did 
not demonstrate it. Kelley's results 
give evidence for such a factor. It is 
the memory for ideas, which are 
probably not expressed verbatim in 
recall tests. Further support for this 
factor is desirable. The hypothesis 
that there is a “content” factor in 
the structural column is still to be 
investigated. It is not easy to say 
what this would be like. The mem- 
ory for a route might qualify, 

Memory-span tests, composed of 
digits and letters have in common a 
memory-span factor. This factor be- 
longs in the structural column. Inci- 
dentally, it is interesting that mem- 
ory-span tests have been rather popu- 


hen nonmusi- 
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lar components of general-intelligence 
scales. It turns out that they meas- 
ure primarily a rather special kind 
of memory ability whose social im- 
portance cannot be very great. Tel- 
ephone operators come to mind first 
in this connection. A general re- 
mark may be made, prompted by 
the emphasis upon memory-span 
tests as measures of intelligence, that 
although many tests correlate highly 
with chronological age, this does not 
ensure that they measure any very 
significant aspect of intelligence. 

In the conceptual column, Jnte- 
gration T, a factor found in AAF re- 
search, is proposed as a memory- 
span factor. The tests Signal Inter- 
pretation and Combat Planes re- 
quire the examinee to keep in min 
a relatively large number of detaile 
rules for success in them. Kelley (27) 
had one span test in which the con- 
tent was in the nature of lists ° 
tasks to be done, the length varying 
as in digit and letter-span tests- 
came out with those other span tests 
on his memory-span factor. It can be 
Predicted that if there were other 
idea-span tests, and perhaps some 
Integration-I tests in the battery) 
two span factors would be found: 

he span factors are probably a 
nificantly correlated. The vaca? 
cell in Row 3 of Table 4 suggests that 
the way is open for someone to s&© 
whether a third memory-span facto 
will be found where the contents arè 
figural, 

To digress somewhat from an 2¢ 
count of the factors, it may 
Pointed out that the fact that there 
are several distinct memory abilities 
may explain some of the phenome” 
observed in memory experiments, 
Particularly where results are dis- 
cordant. Results from memory &* 

° Another hypothesis is tenable with regard 


i identi- 
to Integration I, however. It might be ide” 


cal with the factor memory for ideas. 
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periments may differ markedly, some- 
times, depending upon the kind of 
material and the thing or aspect em- 
phasized. For example, the relative 
strength of backward vs. forward 
associations differs when the material 
1s composed of visual forms or is com- 
Posed of syllables. In transfer ex- 
periments, in view of the different 
abilities involved, it should not be 
surprising that transfers of gains 
in memorizing skills should be so 
limited. It would be interesting to 
test the hypothesis that transfer will 
be relatively greater between tasks 
that depend upon the same memory 
factor or upon the more strongly cor- 
related factors. The same hypothe- 
Sis could be stated with respect to 
thinking factors and other ability 
factors generally. 


Discussion 


The account of the known intel- 
lectual factors and the system into 
which they seem to fall calls for the 
discussion of some general questions. 

here are implications for factor 
theory and for its application to psy- 
chological research in general. There 
i Implications for general psycho- 
gical theory and for the practices of 


Intelligence testing. 


Implications for Factor Theory and 
actor Analysis 


. A theory or a method should be 
raged by its fruits. If the results 
trite have been reported here con- 
in ute to psychological understand- 
k 8 and, through that, to useful psy- 

ological practice, factor analysis 
St om this kind of test. The 
ap r ematical model that has been 
oat ee which conceives of individ- 
fofin ifferences in intellectual per- 
Garten as being represented by a 
a inate system of # dimensions, 
it Served certain purposes. While 

may be shown at some future time 


that the model is not the best that 
could be applied, its power to gen- 
erate new psychological ideas and to 
extend considerably the conception 
of the realm of intellect has been dem- 
onstrated. 

The average reader will no doubt 
be surprised by the large number of 
dimensions that seem to be required 
to encompass the range of intellectual 
aspects of human nature. Some 40 
factors are reported as being known 
and a great many additional un- 
known factors are forecast. This 
would seem to go against the scien- 
tific urge for parsimony. 

The principle of parsimony has led 
us in the past to the extreme of one 
intellectual dimension, which every- 
one should now regard as going too 
far in that direction. There is ac- 
tually no fixed criterion for the satis- 
faction of the principle of parsimony. 
In science we can satisfy the princi- 
ple to some degree whenever the num- 
ber of concepts is smaller than the 
number of phenomena observed. 
Forty, sixty, or even a hundred fac- 
tors would certainly be a smaller 
number of concepts than the number 
of possible tests or the number of ob- 
servable types of activities of an in- 
tellectual character. In this sense the 
principle of parsimony has been satis- 
fied. 

The number of the factors is less 
unattractive when we find that they 
can be subsumed within a system 
that is describable by a smaller num- 
ber of categories or principles, as we 
have seen in the matrices of Tables 
1-5. Some readers will ask whether, 
since there are many probable inter- 
correlations among the factors, a 
small set of second-order factors will 
not suffice. Granting that we can 
make sufficiently accurate estimates 
of the intercorrelations among the 
factors, which the writer doubts that 
we can do at present, to use only sec- 
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ond-order-factor concepts would lose 
information. This follows from the 
fact that where v linearly independ- 
ent dimensions are necessary to de- 
scribe a domain geometrically, no 
one dimension can be entirely ac- 
counted for by combinations of the 
others. 

It may be asked whether some of 
the factors listed are not really spe- 
cific factors rather than common fac- 
tors. This is a legitimate question. 
It is not uncommon experience in fac- 
tor analysis to find what was for- 
merly regarded as a single common 
factor appears later to split up into 
two or more factors. The “splitting 
up” description is not completely 
accurate. It applies best to the fact 
that a group of tests having a “fac- 
tor” in common later divide into two 
or more groups each defining its own 
common factor. In clear thinking 
about this phenomenon, we must 
keep in mind the distinction between 
“factor” as a mathematical concept 
and “factor” as a Psychological con- 
cept. The immediate results of a 
factor analysis are in terms of mathe- 
matical factors. Whether each math- 
ematical factor represents a single 
Psychological factor or a combina- 

tion of psychological factors has to 


be determined by interpretation and 
by further experimental work ap- 


plied to the designing of new factor 
analyses. Eventually we reach the 
stage where further efforts to “split” 
a factor fail. Whether this has 
brought us to a Specific factor in any 
particular case can be decided on the 
basis of a single criterion. Are the 
tests defining this factor essentially 
just different forms of the same test? 
This cannot always be decided with 
certainty, but there js usually little 
difficulty in doing so. If we suspect 
that any factor is a specific, a new 
analysis that include 


S more obvi- 
ously different tests, but tests that 


should measure the same common 
factor, should be done. 

Skepticism was expressed above 
concerning the operation of estimat- 
ing factor intercorrelations. This 
is a somewhat complicated problem 
for which there is as yet no good solu- 
tion. The common procedure in 
vogue at the present time for esti- 
mating factor intercorrelations is to 
do an oblique rotation of axes, lo- 
cate the primary axes and determine 
the cosines of their angles of separa- 
tion. The writer has preferred orthog- 
onal rotations for several reasons. 
Briefly, any particular oblique solu- 
tion to a factor problem is a function 
of several nonpsychological circum- 
stances. For one thing, it depends 
upon the kind of population tested. 
This is not so serious, but we should 
probably have a different set of fac- 
tor intercorrelations for each age 
group, educational level, cultural 
milieu, etc., and for combinations of 
these. This lack of invariance pre- 
cludes making any very general 
Statements regarding the psycho- 
logical interdependencies of factors: 

A more serious matter is that 
oblique solutions depend upon the 
Population of tests that we factor 
analyze. This is not merely a sam- 
pling problem, for the collection of 


tests in a battery is never a ran- 


domly selected one, and should cer- 


tainly not be. Much of this difficulty 

inges on inadequacies of test con- 
struction and test administration: 
Rarely do we succeed well enowgh: 
either by test Construction or by test 
administration, in exerting the e€% 
Perimental controls it would take\t® 
come out with a score that is a puf? 


was genuine correlation or not. Thi 


i 
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kind of result is not uncommon. Until 
we succeed in exerting better experi- 
mental controls in testing, we shall 
not have a very good basis for esti- 
mating factor intercorrelations, even 
for a specified population of ex- 
aminees. 

The question always comes up re- 
garding the origins of factors; are 
they inherited or are they acquired, 
to use the common, loose expression 
of this question. The reply is that 
factor analysis alone cannot answer 
this question. So far as factor analy- 
sis is concerned, the factors could all 
be hereditary in origin, or all environ- 
mental, all some weighted combina- 
tion of both heredity and environ- 
ment, or some due to the one and 
some to the other source. It will 
take experimental work of the usual 
types to answer this question. But 
one thing is clear. The question “Is 
intelligence inherited or is it ac- 
quired” makes less sense than it ever 
did. Such a question must be asked 
regarding each and every factor. Fer- 
guson (4) has expressed the interest- 
ing hypothesis that factors are a 
consequence of the principles of 
transfer of learning. Many of them 
may be, to a large extent. The Fergu- 
son hypothesis is akin to a similar one 
expressed earlier in this paper. 

In connection with origins of fac- 
tors, there is also the question of 
when in child development the fac- 
tors make their appearances. To 
the extent that factors are developed 

Y experience, they would appear at 
such ages as the effects of experience 

ave sufficiently crystallized. To the 
extent that heredity is chiefly re- 
Sponsible for the differentiation of 
factors, their appearances should be 
detectable when maturation effects 
their differentiation. In either case, 
the answer is to be determined by ex- 
Perimental testing and factor analy- 
sis at all age levels at which suitable 


tests can be administered. Such an- 
alyses should be done in populations 
very homogeneous with respect to 
age and other features. It can be 
predicted that the structure of the 
intellectual factors for children will 
be found simpler than that for adults. 
It can also be hypothesized that the 
structure for generally superior adults 
will be found more complex than for 
generally inferior adults. 


Implications for Psychological Theory 


It was suggested earlier that al- 
though psychological factors are vari- 
ables among individual differences 
they also indicate psychological func- 
tions within individuals. It is there- 
fore in order to take the factors seri- 
ously as starting points for psycho- 
logical theory. 

There has never been developed a 
comprehensive theory of thinking. 
We have been short of the essential 
concepts needed in the construction 
of such a theory. In view of the great 
variety of thinking abilities (and 
functions) revealed by factor analy- 
sis, the time-honored concepts of 
reasoning, induction, deduction, and 
the like appear even more inadequate 
than before. It seems to be of little 
value to attempt to relate the factors 
to those categories. The factors, in- 
stead, have generated their own cate- 
gories, which have been already pre- 
sented. They are essentially opera- 
tional concepts, since, like factors, 
they refer back to the kinds of tests 
from which factor definitions were 
inferred. 

Although the general picture of the 
thinking factors is not yet sufficiently 
complete or certain to suggest an 
obvious, general theory of thinking, 
the kind of theory that they will 
eventually generate can be seen. 

It is fairly well agreed that think- 
ing is symbolic behavior. It is not 
surprising, then, that certain factors 
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have to do with symbols, as such, and 
with their utilization and manipula- 
tion. Of all the kinds of symbols 
available to humans in almost any 
culture, words and numbers are 
among those of greatest importance. 
The factors reflect these facts. 

In the operations of thinking, of 
realistic thinking, in particular, the 
factors indicate the important steps 
or processes of discovery, produc- 
tion, and evaluation, often occurring 
roughly in that temporal order. Di- 
vergent thinking may come into the 
picture along with these other phases, 
and auxiliary to them, Particularly 
when they proceed with some diffi- 
culty. Some divergent-thinking pro- 
cesses are also likely to occur in non- 
realistic thinking, when one is simply 
free to do so and finds it rewarding. 
Since realistic thinking js usually 
convergent, particularly when there 
is one right answer, at times there 
may be conflicting divergent-con- 
vergent tendencies, a phenomenon 
that has not been reported, to the 
knowledge of the writer. 

Quite generally, 
thinking processes of 
Proceed more or less ably depending 
upon the kind of content with which 

e is involved—perceived figures, 
recognized structures, or conceived 
meanings. The distinction that has 
sometimes been made between con- 
crete thinking and abstract thinking 
has foreshadowed the major distinc- 
tion here; the distinction between 
figural factors and conceptual fac- 
tors. The appearance of the third 
category—structural—came as a sur- 
prise. If it turns out to be important 
we have several interesting implica. 
tions. 

One practical implication of the 
structural category is that tests 
based upon letter material and the 
like may be of limited significance, 
if in reality we are interested in pre- 


it seems, the 
a person may 


dicting behavior that depends upon 
factors in the figural or conceptual 
columns. A more important implica- 
tion has to do with the fact that 
there is a shortage of known factors 
in the structural column. A rather 
direct reason for this may be that 
there has been a bias toward figural 
and verbal test material, with an 
unfortunate slighting of structural 
material. This would not be so un- 
fortunate if it turns out that in our 
civilization not many such factors 
exist, or if they do exist they are of 
relatively little social importance. It 
may be that there is actually more 
structural-type thinking going on 
than we realize and that both psy- 
chologists and educators have failed 
properly to recognize it. In a highly 
technical age, such thinking would 
seem to be important. We might well 
ask ourselves whether we have over- 
looked something of importance 1n 
this general area, 

The headings of rows in Tables 1-3 
Present an unusual list of concepts, 
which appear to be more epistemo- 
logical than psychological. Is this 
possibly the kind of concepts that 
we have needed? It may be possible 
to give some of them more psycho- 
logical terminology later, but at 
present they refer to the kinds of 
things that we can know and can 
produce. If such terminology de- 
scribes behavior in a significant an 
useful manner, it should be wel- 
comed and its worth should be recog- 


nized. One implication is that the 
lists seem to be open to new addi- 
tions. 


Consideration of what cate- 
gories might be added to the lists 
might turn up some new fruitful 
hypotheses regarding unknown fac- 
tors and functions. k 
he subject of problem solving 
as come into considerable promi- 
nence in recent years. The picture of 
the thinking factors has important 
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implications for problem solving. We 
find that there is no one factor that 
can be called problem solving. This 
1s significant. Problem solving is us- 
ually a complicated process. It is 
clearly indicated that we should stop 
looking for any one function or pro- 
cess that is the sine gua non of all 
Problem solving. As the writer has 
Pointed out elsewhere, many factors, 
including perceptual factors as well 
as thinking factors, may be called 
into play, depending upon the nature 
of the problem (12). 

In the list of thinking factors we 
find one factor having to do with the 
ability to recognize that a problem 
exists and another factor that per- 
tains to the diagnosis of the problem. 
The degree of generality of either 
factor is still to be determined. So 
far as we know now, either may be 
restricted to a relatively narrow cate- 
gory of problems. The next steps in 
the attack on problem solving should 
be to make a survey of the variety of 
Problems that are common and to at- 
tempt to write specifications regard- 
ing the factorial abilities that play 
Significant roles in the solution of 
cach type of problem. We should 
then test these hypotheses by experi- 
mental and factor-analytic proced- 
ures, 

At the beginning of the aptitudes- 
Project investigation of creativity it 
Was hypothesized that certain spe- 
cial, creative factors would be found, 
a few of them being then already 
“Nown, some not. The results have 
Supported most of the hypothesized 
factors but not all (20, 31). Because 
these factors were investigated within 
the arbitrarily designated domain of 
creativity, there has been a tendency 
to think of them as being the exclu- 
Sive creative factors. This concep- 
tion is not fully correct. Creative 
thinking, like problem solving (they 
May actually overlap in many cases), 


depends upon different combinations 
of factors, and the combination of 
factors significant to the task will 
vary from time to time. The problem 
confronting us here, as with problem 
solving, is to recognize the main cate- 
gories of creative production and to 
seek the significant combinations of 
factors involved in them. Although 
certain factors such as ideational 
fluency and originality will carry rela- 
tively more weight, other factors not 
obviously creative may often be sig- 
nificant, as when an invention de- 
pends upon thinking by analogy or 
upon visualization. 

Thinking has many connections 
with learning, and hence the thinking 
factors are of some importance in 
learning investigations and learning 
theory. Thinking is sometimes re- 
garded as a form of learning, for while 
we think we usually learn. Another 
view of the connection is that think- 
ing contributes to learning. The lat- 
ter view is more productive of ap- 
proaches to investigation of the role 
of factors in learning. It is not 
enough to conclude that thinking 
contributes to learning or even to 
state and to test this as a general hy- 
pothesis. The questions raised here 
should be “Where and how does fac- 
tor X contribute to learning?” just 
as it was asked in the preceding para- 
graphs where and how each factor 
contributes to problem solving and 
creative activity. Since problem 
solving and creative activity are 
properly regarded as instances of 
learning, we need only generalize the 
question to make it apply to all learn- 
ing. Fleishman and Hempel (5, 6, 
7) have already provided some ex- 
cellent demonstrations of the roles of 
factors at different stages in the 
learning process for certain psycho- 
motor tasks. This type of investiga- 
tion should be applied more gen- 
erally. Certainly we should have 
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outgrown the glib definition that 
_ “intelligence is learning ability.” 

The distinction between associa- 
tive and content-memory factors re- 
minds us that not enough attention 
is generally paid to the same distinc- 
tion in studies of learning and mem- 
ory. Learning theory has restricted 
itself almost entirely to the forma- 
tion and retention of associative con- 
nections, leaving out of account the 
learning of substance. 

Speaking of learning suggests the 
practical operation of education. At 
some future time factors should have 
much effect upon educational prac- 
tices, in addition to those effects havy- 
ing to do with assessment. 


3 ~~ MON occurrences in 
connection with intellectual losses 
that “re associated with or 
functional pathologies, 


and in providing better defini 


i . H . tions 
and diagnostic criteria, 


Intelligence and Intelligence Tests 

A treatment of 
ellect would be j 
considering their į 


the factors of in- 


ncomplete without 
mplications for the 


concept of intelligence and for the 
present and future of intelligence 
testing. Is the concept of intelligence 
still useful? What is the nature of 
current intelligence tests in terms of 
factors? What should the future 
trends in intelligence testing be? 

As to general terminology, the 
term “intellect” can be meaning- 
fully defined as the system of think- 
ing and memory factors, functions, 
or processes. The term “intelligence” 

as never been uniquely or satis- 
factorily defined. Factor analysis 
has fairly well demonstrated that it 
is nota unique, unitary phenomenon. 
A “general factor,” found by what- 
ever method, is not invariant from 
one analysis to another and hence 
fails to qualify as a unity, independ- 
ent of research circumstances, as 
Vernon has well stated (29). The 
methods of multiple-factor analysis, 
which have been chiefly responsible 
for discovering the factors listed 
above, do not find a general psycho- 
logical factor at the first-order level 
and they find no second-order factor 
that can Properly lay claim to the 
title of “intelligence.” 
he term “intelligence” is useful, 
one the less. But it should be used 
in a semipopular, technological sense- 
It is convenient to have such a term, 
even though it is one of the many 
Oncepts we have in ap- 
Plied Psychology. It would be very 
Purposes of communi- 
derstanding, to specify 
f intelligences—intelli- 
gence A, intelligence B, and so on. 
This could be done in terms of the 
combinations of certain intellectual 
factors and their weightings in the 
combinations. 
€ have such combinations now in 
Connection with the intelligence tests 
and scales in common use. Let us 
consider what kind of combinations 
we have in two of the most used in- 
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telligence scales. A really good factor 
analysis of the Stanford Revision of 
the Binet scale would be rather diffi- 
cult, and cannot be done satisfac- 
torily without adding to the analyzed 
battery a liberal number of reference 
tests. This has never been done. The 
best analyses that we have were done 
by Jones (24, 25), who found ten fac- 
tors among 30 selected items. His 
resulting picture is not clear because 
among the 30 items were essentially 
alternate forms of tests (at different 
age levels) and no outside reference 
tests were used. A fully satisfactory 
analysis of the Stanford-Binet items 
would undoubtedly reveal more than 
ten factors present. 
It should be noted that when so 
many factors are present, a composite 
Score based upon all the items can 
Measure each component only to a 
Small degree, if they are nearly 
€qually weighted in the composite. 
It can also be predicted that the 
factorial composition of the Binet IQ 
will be found to vary somewhat from 
One age level to another. This feature 
May contribute to a small extent to 
obtained changes in IQ where sub- 
Stantial age differences are involved. 
As it actually happens, a Stanford- 
inet IQ, or any IQ from a test whose 
Components are predominantly ver- 
al, is a total score heavily dominated 
by the verbal-comprehension factor. 
This leaves the other factors with 
little or no effective voice in the com- 
Posite, even though they are repre- 
Sented in the scale. In nonverbal in- 
telligence tests, there is likely to be 
€ss domination by any one factor, 
but the nature of the composite var- 
les considerably from battery to bat- 
tery, 

Analyses of the components of the 
Vechsler-Bellevue scale have also 
een generally inadequate. The most 

adequate analysis has been done by 
avis (3), who utilized a number of 


reference tests from outside the 
Wechsler battery. He found nine 
common factors, six of which are 
probably to be identified with factors 
in the intellectual list. Where stand- 
ard tests of intelligence are widely 
used, it becomes increasingly impor- 
tant to attempt to write the specifi- 
cations for their total scores as well 
as their part scores, so that obtained 
scores of individuals may be most 
meaningfully interpreted. 
Intelligence tests will probably 
continue to be used for some time to 
come much as they are. In order to 
use them most wisely and to extract 
the greatest amount of information 
from their scores, the specification of 
such scores in terms of known factors 
is one important improvement that 
could be made. The other great step 
toward improvement in intelligence 
testing would be to emphasize more 
than at present some of the socially 
important factors that have to do 
with productive thinking. The 
knowledge of the factors of this kind 
and of the kinds of tests that meas- 
ure them is largely available. Only 
by this kind of extension of intelli- 
gence testing can we do adequate 
justice to adult, human intellect. 
Other extensions may also be very 
useful, for we are a long way from 
complete coverage of the intellectual 
factors in present tests. For differ- 
ential prediction, and this includes 
the operation of vocational guidance, 
only single-factor scores will do com- 
plete justice in the description of in- 
dividuals. As a necessary prelude to 
to the use of factor measures for such 
purposes, we need innumerable vali- 
dation studies in which factors play 
an important role, studies such as 
those by Hills and others (23, 18). 


SUMMARY 


A listing of the factors that can be 
regarded as intellectual was made, 
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including those reported in French’s 
summary of factors (8) appearing in 
1951 and those reported since that 
time. Of approximately 40 such fac- 
tors, seven are memory factors and 
the remaining ones have to do with 
oe ae was made to formulate 
a system into which the factors seem 
to fall. The thinking factors were 
categorized under the general head- 
ings of cognition (discovery), produc- 
tion (convergent thinking and diverg- 
ent thinking), and evaluation. The 
factors in each group can be arranged 
according to three kinds of content of 
thinking—figural, structural, and con- 
ceptual. In the cognition and produc- 
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tion groups, a second principle a 
classification, cutting across mages 
tent principle, pertains to the e 
of things discovered or produced. aU 
the memory group the second pri : 
ciple pertains to the kinds of hinge 
remembered—associations or su 
stance. The result is a matrix of nel 
tors in each of the areas, with vacan 
cells. The vacancies suggest hy- 
potheses for undiscovered factors. a 
In the general discussion, impice i 
tions of the factors and their spem 
were pointed out for factor o 7 
and practice, for general peyeho op 
cal theory, and for the concept of 1 


P intelligence 
telligence and practices of intelligen 
testing. 
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It is by now generally recognized 
that all forms of psychotherapy yield 
successful results with some patients 
and that these successes depend to an 
undetermined extent on factors com- 
mon to many types of relationship 
between patient and therapist. This 
poses a knotty problem for propo- 
nents of various specific forms of psy- 
chotherapy who are convinced that 
their successes result from their 
particular theory or technique and 
wish to convince others of this. Asa 
result, problems of research design 
in Psychotherapy have been receiving 
more and more critical attention in 

ecent years, especially with reference 
E controls (6, 11, 20, 23, 24, 25, 27; 

31, 34, 35, 38, 39), 
ertain general aspects of the psy- 
chotherapeutic relationship seem very 
similar to those responsible for the so- 
alled placebo effect, which is well 
nown to investigators of the thera- 
eutic efficacy of medications. The 
urpose of this paper is to describe 
the placebo effect, discuss some of its 
implications for the evaluation of 
psychotherapy, and make some rec- 
ions concerning research 


ommendati 
„Psychotherapy based on 
these considerations, 


THE PLACEBO EFFECT 


pated in two 
of the effec- 
symptomatic 

Outpatients 
Both studies involved the 
administration of a Placebo, an inert 
agent outwardly indistinguishable 
from the agent being tested, as well 
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as drugs. The physician never pet 
whether he was giving the paten 
drug or placebo. The patients ee 
told that a new medicine had beroni 
available which, it was thoug i 
might help them. The pny 
rated symptoms on a 4-point scaig 
distress, with high reliability. f 
both studies a significant reduction oi 
distress accompanied the taking © 
placebos, as shown in Table 1. a 
This phenomenon occurs with de 
regularity, not only with respect A 
the kinds of symptoms usually Eo 
ciated with psychologic illness, P 
with others as well. For example, i 
a study of vaccines for the oo 
cold, there was found a reduction be 
the number of yearly colds of 55 ae 
cent among those given vaccine an 
of 61 per cent among a control ae 
who received injections of wie 
sodium chloride solution (4). Hi 5 
(15) found placebos as effective i 
other agents in inhibiting the ponin 
reflex. Wolf and Pinsky (37) studie! 
medical outpatients suffering trong 
peptic ulcer, migraine, muscle Le 
sion, headache, and tight muscles 1 
the extremities. All were also tense 
and anxious, Twenty to thirty per 
cent felt better while taking placehoy 
Lasagna et al. (19) gave 1 ml. = 
saline by subcutaneous injection t 
Surgical patients sy ffering from steady, 
severe wound pains and found tha d 
30 to 40 per cent reported a satis 
factory relief of pain. In a study ay 
Jellinek (18) 60 Per cent of 199 su d 
jects with chronic headaches receive 


. re 
relief from a placebo on one or mo 
Occasions. 


<n 


ee 
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TABLE 1 
SYMPTOM Distress BEFORE EXPERIMENTS AND AFTER A TRIAL ON PLACEBOS 


Mean distress scores 


Study N Drug tested i 
eres) Before After Significance 
experiment placebo of difference 
Ist study 17 Mephenesin 25.58 15.88 01 
2nd study 16 Reserpine 34.06 24.69 02 


The placebo effect is not always 
favorable, but may also result in un- 
desirable, distressful reactions. As 
far back as 1933, Diehl (3) using lac- 
tose placebos as a control for a variety 
of medications taken by mouth, 
found that some of his subjects re- 
ceiving placebos developed nausea, 
faintness, and diarrhea. Sometimes 
this “toxic response” to placebos may 
even attain major proportions. Wolf 
and Pinsky (37) tell of one patient 
who had “overwhelming weakness, 
Palpitation, and nausea within 15 
minutes of taking her tablets.” In 
another, “a diffuse itchy erythema- 
tous maculopapular rash developed 
after ten days of taking pills. A skin 
consultant considered the eruption to 
be typical dermatitis medicamentosa. 
After use of the pills was stopped, the 
eruption quickly cleared.” A third 
Patient developed epigastric pain fol- 
lowed by watery diarrhea, urticaria, 
and angioneurotic edema of the lips 
within ten minutes of taking her pills. 
One of our own patients, who had 
been tolerating’ a chronic syphilo- 
Phobia fairly well, became acutely 
agitated shortly after placebo inges- 
tion, bemoaning what the pills had 
done to him, and required hospitaliza- 
tion shortly thereafter. 

Wolf and Pinsky (37) found that 
Placebos produced more improve- 
Ment in subjective than objective 
Manifestations of anxiety and ten- 
sion, but objective changes also oc- 
cur, In our second study (22), 69 
Per cent of our patients showed de- 


creased blood pressure and pulse 
readings following placebo, 19 per 
cent showed increased blood pres- 
sure, and 25 per cent showed a rise in 
pulse rate. Wolf (36) demonstrated 
clearly and convincingly that actual 
end-organ changes can follow placebo 
administration. This demonstration 
was made in a series of studies on the 
now-celebrated Tom, a human sub- 
ject with a large gastric fistula, in 
whom it was possible to observe di- 
rectly the gastric mucous membrane, 
correlating changes in color and tur- 
gidity with simultaneous measure- 
ments of gastric secretion and motor | 
activity. 

The placebo effect may actually 
reverse the normal pharmacologic 
action of a drug. For example, Wolf 
reports that Tom was repeatedly 
given Prostigmine, which induced 
abdominal cramps, diarrhea, as well 
as hyperaemia, hypersecretion, and" 
hypermotility of the stomach. Sub- 
sequently, the same response OC- 
curred not only to tap water and lac- 
tose capsules, but also to atropine 
sulfate which usually has an inhibit- 
ing effect on gastric function. 
pregnant patient with excessive vom- 
iting showed the usual response of 
nausea and vomiting to ipecac. These 
manifestations were accompanied by 
cessation of normal gastric contrac- 
tions. When ipecac was given 


through a tube with strong assurance 
that it would relieve her vomiting, 
gastric contractions were resumed at 

gestion of 


the same interval after in 
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the drug that they would normally 
have ceased, and her nausea and 
vomiting were relieved. 

The placebo effect, in short, can 
be quite powerful. It can signifi- 
cantly modify the patient's physio- 
logical functioning, even to the ex- 
tent of reversing the normal pharma- 
cological action of drugs; and, as will 
be discussed below, it may be endur- 
ing. Placebo effects cannot be dis- 
missed as superficial or transient. 
They often involve an increased sense 
of well-being in the patient and are 
manifested Primarily by relief from 
the particular symptomatic distress 
for which the patient expects and re- 
ceives treatment. Thus, the relief 
of any particular complaint by a 
given medication is not sufficient 
evidence for the specific effect of the 
medicine on this complaint unless jt 
can be shown that the relief is not 
obtained as a placebo effect. 


IMPLICATIONS OF THE PLACEBO 
EFFECT FOR RESEARCH IN 
PSYCHOTHERAPY 
The giving of an 
have certain me 


stress caused by 
Wolf believes the 


his patients 


a is or that 


effect would result.” The degree of 


the patient's conviction might be ex. 
pected to be influenced by his preyj- 
ous experiences with doctors, his 
confidence in his Physician, his sug- 
gestibility, the Suggestibility-enhanc- 
ing aspects of the situation in which 
the therapeutic agent is being ad- 
ministered, and his faith in or fear of 
the therapeutic agent itself, These 
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attitudes are obviously relevant to 
psychotherapy. 


Psychotherapists have theories of - 


personality and psychotherapy pe 
plan their therapeutic actions in the 
belief that these are the active agents 
which produce the desired results. 
Any favorable changes in patients 
consequent to a course of psychother- 
apy tend to be cited as evidence for 
the validity of the theory of person- 
ality and neurosis which underlie as 
rationale of the psychotherapy. a 
view of the above discussion it may 
well be that the efficacy of any par 
ticular set of therapeutic operations 
lies in their analogy to a placebo i 
that they enhance the therapist's 
and patient's conviction that somn 
thing useful is being done. Patien x 
entering psychotherapy have variou 
degrees of belief in its efficacy, ena 
this may be an important factor a 
the results of therapy, but this ha 
not been studied, to our knowledge. 
We know that the authoritarian oo 
titude of the physician can produc 
this conviction in some patients. 

At first glance the attitudes found 
by Fiedler (8, 9) to characterize e 
perienced Psychotherapists, viz. a 
ings of empathy for and closeness É 
the patient, an undemanding atti- 
tude, security, and the ability to un” 
derstand” the patient, seem aip 
metrically opposed to the authoritar 
ian attitude. It may be, however 
that the therapeutic efficacy of ee 
attitudes lies primarily in their abi a 
ity to increase the confidence of ce" 
tain patients in the ability of the 
therapist to help them. Lack of such 
confidence may be one of the reasons 
why patients of lower socioeconomic 
Status fare less well in psychotherap? 
than patients higher in this scale m 

9), a talking therapy seeming to Š 
eyond their comprehension and CON” 
trary to their conception of the doc 
tor-patient relationship. 
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In this connection, the role of sug- 
gestion in psychotherapy has been 
emphasized for years, especially in 
therapies utilizing hypnosis, but sug- 
gestion effects have been thought 
by many since Freud to be superficial 
and transitory. We know of no ex- 
perimental study which demonstrates 
that therapeutic effects based on in- 
sights or perceptual reorganization, 
which may also be suggested, are less 
superficial or less transitory. 

It may be pointed out parentheti- 

cally that conviction of the helpful- 
ness of therapy need not be equated 
with “motivation for therapy,” which 
was investigated by Grummon (13) 
and Dymond (5) and found to have 
little relationship to success in psy- 
chotherapy. Patients are often suf- 
ficiently distressed to be strongly 
motivated to receive help, yet have 
little faith that a procedure such as 
psychotherapy can help them. 
_ The similarity of the forces operat- 
ing in psychotherapy and the placebo 
effect may account for the high con- 
sistency of improvement rates found 
with various therapies, from that 
conducted by physicians without psy- 
chiatric training to intensive psycho- 
analysis (7). This explanation gains 
plausibility from the fact that re- 
ported improvement rates for vari- 
ous series of neurotics treated by dif- 
ferent forms of psychotherapy hover 
around 60 per cent (1). This is the 
same as that reported for the placebo 
effect in illnesses in which emotional 
components may play a major role 
such as “colds” (3) and headaches 
(18). 

To show that a specific form of 
treatment produces more than a non- 
specific placebo effect it must be 
shown that its effects are stronger, 
last longer, or are qualitatively dif- 
ferent from those produced by the 


administration of placebos, or that it 7 


affects different types of patients. 
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Our knowledge of all these matters 
is still fragmentary, but some begin- 
nings have been made. 

With respect to the strength and 
qualitative nature of the effects of 
therapy, one line of endeavor has 
been to study the physiological 
changes occurring during psycho- 
therapy. Since physiological meas- 
ures usually used to provide evidence 
of resistance or frustration (26, 33) or 
similar psychological states during 
psychotherapy (28) may also be in- 
fluenced by the placebo effect, one 
cannot conclude that demonstration 
of such physiological changes implies 
a greater depth of therapy or a more 
profound reorganization of the per- 
sonality, unless we are willing to 
equate the placebo effect with such 
reorganization. 

With respect to the duration of 
improvement, if it could be shown 
that the placebo effect is of shorter 
duration than changes specific to a 
given psychotherapy, this would pro- 
vide one kind’ of evidence favoring 
that theory of psychotherapy. As 
far as we know, no study of the limits 
of duration of the placebo effect has 
been made. Our experiment with 
mephenesin vs. placebo covered four 
two-week periods. Figure 1 shows 
the curves for both agents for the 


eight weeks. 
Figure 1 shows that the greatest 


decrease in distress following place- 
bos was felt during the first two-week 
trial period. After that, a slight but 
statistically insignificant rise in dis- 
tress occurred ; and, at the end of 
eight weeks, the placebo effect was 
about as great as after two weeks. 
Unfortunately, our data yielded no 
information on how much longer 1t 
might have endured. If the effect is 
analogous to the relief of pain by 
placebos in patients with surgical 
wounds, we should expect it eventu- 
i ally to diminish. k Lasagna é al (19) 


Dated 


positive placebo reactors, then te 
improvement could not be attribui 
to the specific form of treatment. If, 
however, they were known not to be 
positive placebo reactors, then any 
demonstrated improvement wou 
constitute evidence of efficacy specific 
to the form of psychotherapy. 
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PLACEBO ON SYMPTOMATIC DISTRESS OVER AN 
8-WEEK PERIOD. 


Total patients =17. At the 2-, 4-, 6-, and 
8-week intervals, N for placebo =11, 6, 10, 
and 7 respectively, while N for mephenesin 
=6, 11, 7, and 10 respectively. For the 2- 
and 4-week periods, the dosage of mephenesin 
was 3 gms. per day; for the 6- and 8-week 
periods, 9 gms. per day. 


found that as placebo therapy of such 
patients continued the relief ex- 
perienced decreased. 

Although the number of patients 
is too small to justif 


it is intriguing that the first dose of 


y by the psy- 
sturbs the pa- 
bly counteract 
Psychotherapy 


pful to know if 
fferentiated ac- 
s which predis- 
itive or negative 
If patients who im- 
rticular form of psy- 
e all known to be 


cording to attribute 
posed them to a pos 
placebo effect. 

proved with a pa 
chotherapy wer 


There is little known, howe 
with regard to the attributes © 
placebo reactors. Lasagna et al. (19) 
have made the first attempts to ie 
vestigate this problem and repor 
some attitudes and Rorschach cate- 
gories which differentiated their al 
actors (V=11) from their none 
actors (V=16). However, only 
per cent of their patients were we 
sistent reactors, i.e., showed the aA 
fect with every placebo dose, and 
per cent were consistent nonreactors 
while 55 per cent showed the effec 
on some occasions but not on others. 
This contrasts with the findings a 
Jellinek (18) whose patients wi 
headache were, for the most par ? 
either in the always-relieved group PA 
the never-relieved group, with only 
small percentage of patients showing 
inconsistency of response. The ap 
parent contradiction in findings may 
perhaps result from the.difference 1 
the cause of the pain in the two a, 
or from other factors. In any case 
indicates that the problem is a com 
Plex one needing much more study: 

In the light of these consideration 
any method of demonstrating t 
Specificity of response to a given type 
of psychotherapy would have to pro 
vide an adequate control design. 5 
far as we know, the study which ha 
paid closest attention to the question 
of controls in research in psychothes” 
apy is that of Rogers and his one 
leagues (31). They employed two dil- 
erent kinds of control groups. One 
Was a group of nonclients who were 
simply given a battery of tests before 
and after specified time periods. The 


} 
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other was a group of clients who were 
required to wait a specified period of 
time before beginning therapy. This 
group was tested at the beginning 
and end of the wait period, at the end 
of therapy, and after a follow-up pe- 
riod. 

These procedures do not control 
for the placebo effect since neither 
control group was being subjected 
to any special procedures which could 
produce a reasonable expectancy in 
control subjects that certain changes 
should occur. The experimental 
group, however, could be expected 
to anticipate certain effects merely as 
a consequence of participating in the 
client-therapist interviews. There- 
fore, even though favorable changes 
could be demonstrated in their cli- 
ents, the question of whether these 
were placebo effects could not be 
answered from such research design 
unless additional information were 
Provided. 

If we do not control for nonspecific 
factors like the placebo effect, we 
cannot know whether effects pre- 
dicted from a theory lead to or result 
from improvement based on the non- 
specific effect. Butler and Haigh (2), 
for example, report an increased cor- 
relation of perceived self with ideal 
self following client-centered therapy. 
The implicit inference is that the 
Specific therapeutic method leads to 
this increased correlation which, in 
turn, contributes to amelioration of 
disability and distress. 

It is conceivable, though, that as a 
result of a nonspecific placebo effect 
the client feels less disabled and dis- 
tressed which, in turn, leads him to 
describe himself as more like his ideal 
self. Rogers’ (30) findings of greater 
emotional maturity in successfully 
treated cases may be similarly ex- 
Plained, clients feeling less disabled 
and distressed due to a nonspecific 
Placebo response and behaving con- 
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sequently in ways which are less anx- 
iety-determined and which are seen 
as more mature by others. 

We would propose that the follow- 
ing conditions are optimal in planning 
research in psychotherapy: 

1. A theory of personality and psy- 
chological distress (neurosis, malad- 
justment, etc.). 

2. Predictions of effects in the pa- 
tient or client consequent to psycho- 
therapy, in accord with the theory. 

3. Demonstration of a relationship 
between the predicted effects and 
some criterion of improvement. 

4, Demonstration that the pre- 
dicted effects and their relationship 
to the improvement criterion are not 
due primarily to the patient’s convic- 
tion that therapy will help him. This 
will permit greater confidence that 
the relationship found is specific to 
the therapeutic technique derived 
from the theory. 

Ideally, these conditions should 
obtain both for process and outcome 
research. There seems to be general 
agreement with regard to the first 
two conditions although Mackinnon 
(21) has some reservations about 
beginning with a theory rather than 
a hunch. Gordon et al. (12) have 
come to question the third condition, 
at least with respect to a “global” 
criterion of improvement. 

The fourth condition has not been 
met in any research of which we are 
aware. It is not possible to set up an 
experiment precisely analogous to 
comparison of a medication with a 
placebo because there is no such thing 
as inert psychotherapy in the sense 
that placebos are pharmacologically 
inert. However, it may be possible to 
study the possible specific effects of 
any particular form of therapy by the 
use of a matched control group par- 
ticipating in an activity regarded as 
therapeutically inert from the stand- 
point of the theory of the therapy 
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being studied. That is, it would not 
be expected to produce the effects 
predicted by the theory. The 
“placebo psychotherapy” in this 
sense would be analogous to placebos 
in that it would be administered 
under circumstances and by persons 
such that the patients would expect 
to be helped by it. f 

Let us say that our theory is psy- 
choanalytic and our predicted effect 
is an increased correlation between 
the moral values of the patient and 
the therapist (superego identifica- 
tion) and that we also expect an as- 
sociation between the increased cor- 
relation and a criterion of improve- 
ment (32). According to the theory, 
there is no reason to believe that con- 
trol patients receiving, for example, 
relaxation therapy (17) will show the 
increased correlation of moral values 
with their therapist's moral values, 
nor should they show as much or as 
lasting improvement as the patients 
receiving Psychoanalytic therapy of 
equal length. Such a design would 


constitute a fair test of the hypothesis 


based on the theory. In comparative 


studies where one type of psychother- 
apy is tested against another, 


between them in 


and therapists to 
are assigned. 


SUMMARY AND CONCLUSIONS 
The literature on the therapeutic 
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efficacy of drugs compared Ta 
placebos is briefly reviewed, an ee 
relevance for research in psychot a 
apy considered. It is concluded hae 
improvement under a Ton HoR 
of psychotherapy cannot be ta : he 
evidence for: (a) correctness 0 (3) 
theory on which it is based ; ee 
efficacy of the specific technique = ss 
unless improvement can be oan 
be greater than or qualitatively the 
ferent from that produced a the 
patients’ faith in the efficacy "ihe 
therapist and his technique— te 
placebo effect.” This effect may 5 
thought of as a nonspecific fort ite 
psychotherapy and it may be ie 
powerful in that it may ates tress 
organ changes and relief from dis 

of considerable duration. 

To show that a specific form ol 
Psychotherapy based on a topo es 
personality and neurosis pro aof 
results not attributable to the as zs, 
specific placebo effect it is not with 
cient to compare its results no 
changes in patients renee ons 
treatment. The only adequate apy 
trol would be another form of iher so 
in which patients had equal fait a 
that the placebo effect oP not 
equally in both, but which wou rapy 
be expected by the theory of a ef 

eing studied to produce the sam out 
fects. We need to learn more @ the 
the nature of the placebo | the 
conditions giving rise to it, an i 
attributes of patients most jog 

le or resistant to it so that pee the 
obtain a better understanding © 


È cho- 
role of nonspecific factors in ps¥¢ 
therapy, 
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; In recent years a number of studies 
involving human Ss have been de- 
voted to testing the implications of 
certain Hullian notions concerning 
the relationship between performance 
in learning situations and level of 
total effective drive (D). In these in- 
vestigations drive has been defined in 
terms of scores on a manifest anxiety 
scale (41). In view of the growing 
experimental literature concerning 
these hypotheses since their initial 
Statement by Taylor (40) and Taylor 
and Spence (43), an attempt to out- 
line the theory as it is presently con- 
ceived by the Iowa group and to 
evaluate the evidence concerning it 
Seems to be in order. 

Before proceeding with these mat- 
ters, however, certain misunderstand- 
ings which have arisen concerning the 
Purpose of this work should be men- 
tioned. First, although groups have 
been selected exclusively on the basis 
ol scores on the Manifest Anxiety 
Scale (hereafter designated as MAS) 

€ interest of the Iowa group has 
not been in investigating anxiety as 
a phenomenon, but rather in the role 
of drive in certain learning situations. 

he assumption has been made that 
anxiety scores are related in some 
Manner to drive level, but in terms 
of the Major theoretical interests of 
a 1S group, any other acceptable spec- 
Cation of drive (eg., hunger) could 
© used in experimental tests of the 
YPotheses about the effect of drive 
evel, Further, as Farber (6) has 
Pointed out, no attempt has ever been 
pade to claim that the only difference 
etween individuals receiving differ- 
ent scores on the MAS is in drive 


level or that all performance differ- 
ences could be explained by drive. 
Undoubtedly there are many char- 
acteristics other than drive level on 
which anxious and nonanxious Ss 
differ; the investigation of these addi- 
tional properties of anxiety groups 
and their influence on performance is 
certainly both legitimate and impor- 
tant, but it simply has not been the 
interest of the proponents of the drive 
theory. 

A second point that should be clari- 
fied has to do with the MAS. The 
construction of the test was not 
aimed at developing a clinically useful 
test which would diagnose anxiety, 
but rather was designed solely to se- 
lect Ss differing in general drive level. 
Thus the question of the scale’s 
“validity” (i.e., its agreement with 
clinical judgments) is in a sense irrele- 
vant to the experimental purposes for 
which the test was developed. In 
light of this, the test might better 
have been given a more noncommittal 
label, such as a measure of emotion- 
ality, although the fact that the items 
on the scale were selected by clini- 
cians as referring to manifest anxiety 
as it is described psychiatrically does 
not make the title completely inap- 
propriate nor a relationship between 
clinical judgments and MAS scores 
unexpected. Certainly the generality 
of the experimental findings with the 
MAS would be increased if correla- 
tions were found with other defini- 
tions and such attempts will be dis- 
cussed in a later section. However, 
regardless of the results of such stud- 
ies, it should be clearly understood 
that “manifest anxiety” has been de- 
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fined operationally only in terms of 
test scores and will be so employed, 
unless otherwise indicated, in the 
present paper. 


DRIVE THEORY 


As stated earlier, the purpose of 
the Iowa group has been to investi- 
gate the effects of varying drive level 
on performance in learning situa- 
tions. Actual experimentation has 
involved two independent problems: 
(a) specification of the conditions un- 
der which drive differences are said 
to appear, and (b) the theory con- 
cerning the effects of drive level on 
behavior once drive has been aroused. 
The first problem concerns the pos- 

_ tulated relationship between the MAS 
and drive level, the second between 
drive (or anxiety) level and perform- 
ance in various situations. Since the 
two are separate matters, an outline 
of the theory concerning the influence 
of drive will be given first and the 
hypothesized relationship between 
drive and MAS scores considered at 
a later point. 

According to Hull (15), all habits 
H) activated in a given 
ombine multiplicatively 
total effective drive state ( 
ing at the moment to form 
potential E{E={(HXD)}. 
fective drive, in the Hullia 
is determined by the sum 
all extant need states, pri 
secondary, irrespective of t 
and their relevancy to the type of 
reinforcement employed. Since re- 
sponse strength is determined in part 
by £, the implication of varying 
drive level in any situation in which 
a single habit is evoked is clear: the 
higher the drive, the greater the value 
of E and hence of response strength, 
Thus in simple noncompetitional ex- 
perimental arrangements involving 
_ only a single habit tendency the per- 


situation 
with the 
D) operat- 
excitatory 
Total ef- 
n system, 
mation of 
Mary and 
heir source 
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formance level of high-drive Ss should 
be greater than that for low-drive 
roups. 

Á Higher drive levels should not, 
however, always lead to superior per- 
formance (i.e., greater probability of 
the appearance of the correct re- 
sponse). In situations in which a 
number of competing response tend- 
encies are evoked, only one of Which 
is correct, the relative performance A 
high and low drive groups will depen 
upon the number and comparative 
strengths of the various response 
tendencies. Predictions concerning 
the performance of the groups in suc 
complex tasks involve the introduc- 
tion of additional Hullian concepts: 
oscillatory inhibition (0) and thresh- 
old (L). à 

The concept of O was introduce 
by Hull (15) in an attempt to allow 
for statement, within his system, 9 
the intra-individual variability 1" 
behavior that occurs, presumably, 
because of uncontrolled variations 
from instant to instant within the 
organism and in his environment. 
The value of O is said to vary from 
moment to moment, the distributii! 
of O values for a group of (like) indt- 
viduals on any trial forming a norma 
probability function. O is further as- 
sumed to play an inhibitory role, its 
value being subtracted from excita- 
tory potential (E), thus yielding 
momentary excitatory potential (#)- 
In order for Ë to activate a response, 
it must attain a minimum or thresh- 
old value (E), a value that is pre: 
sumably the same for all similar habit 
tendencies evoked in a given situa- 
tion. Thus R={(#) ={(E-0-L). 

In any task in which a stimulus 
tends to evoke a number of compet- 
ing responses the response that will 
appear on a given occasion will be the 
one with the highest suprathreshold 
momentary excitatory strength (Æ) 


DRIVE THEORY AND MANIFEST ANXIETY 305 


at that moment. Other things being 
equal, of course, the response with 
the greatest H and hence E value will 
have a greater probability of occur- 
ring than any other response. 

Adding the notion of differing drive 
level to this conception, we see that 
the probability of appearance of the 
correct response involves an interac- 
tion between drive level and the num- 
ber and comparative strengths of the 
correct and incorrect tendencies. 
When the correct response is weaker 
(ie., has less H) than one or more of 
the competing response tendencies, 
high-drive groups should be inferior 
in performance to low-drive Ss. That 
is, because of the multiplicative rela- 
tionship between habit strength and 
drive, the stronger incorrect tenden- 
cies gain relatively more E than the 
correct tendency in the case of high 
drive Ss than in low drive, thus lead- 
ing to a greater probability of occur- 
rence of one of the stronger incorrect 
responses in the high-drive group. 
Further, the possibility exists that 
under a high-drive level new compet- 
Ing responses with very weak habit 
Strengths may be brought over the 
threshold value of E with the conse- 
quence that the probability of occur- 
rence of the correct response is low- 
ered relative to that in a low-drive 
condition. 

At the other extreme, the correct 
Tesponse tendency may be highest in 
the hierarchy and relatively strong 
when compared to the incorrect. In 
Such a situation, which is comparable 
to the case in which but a single habit 
1S aroused, the E value for the correct 
Tesponse would be relatively greater 
than the other responses in the hier- 
archy for the high-drive group than 
for the low-drive, leading to the pre- 
diction of the superiority of perform- 
ance of such subjects. 

It should be obvious, then, that 


maximum inferiority of high-drive Ss 
would be expected when a large num- 
ber of competing tendencies are pres- 
ent and the correct tendency is both 
relatively weak and low in the hier- 
archy. As the strength of the correct 
tendency increases relative to the in- 
correct, high-drive groups should be- 
come less inferior and eventually 
superior in performance to low-drive 
groups. The exact point of equality 
would be difficult to specify. Even 
when the correct response is highest 
(though not strongly dominant) in 
the hierarchy, high-drive Ss could 
still conceivably be inferior in some 
instances since a greater number of 
suprathreshold tendencies could more 
than offset the advantage of the rela- 
tively higher E value of the correct 
response for these individuals." 

An important consideration that 
should be noted about making pre- 
dictions concerning the effect of drive 
level upon performance in actual ex- 
perimental situations is that a be- 
havioral analysis of the situation 
must have been made; only in experi- 
mental arrangements in which the re- 
sults, independent of drive level, per- 
mit statements in terms of competing 
S-R tendencies are deductions from 
the theory possible. While the ma- 
jority of investigations designed to 


1In a recent review Child (3) incorrectly 
interpreted the theoretical analysis outlined 
above as involving the sudden introduction of 
O and L for the situation in which the correct 
response is highest in the hierarchy. These 
concepts are of course assumed to be operating 
in all situations, including the noncompeti- 
tional one in which but a single response 
tendency is being evoked. No appeal was 
made to these constructs in the latter instance, 
however, since their inclusion would not affect 
the predictions. Mention might also be made 
of other constructs in the Hullian system 
(eg., I, V, K, etc.): it has been assumed that 
these are of equal value for all drive groups 
and that a consideration of their values would 
not result in changing any prediction. 


306 JANET A. 


test implications of these derivations 
concerning drive level have utilized 
tasks for which analyses in S-R terms 
had already been made and found to 
be useful, occasionally an experiment 
appears in which the investigator at- 
tempts to evaluate the total theory 
by comparing groups on a task which 
is poorly understood (and for which 
little or no rationale is presented) or 
which clearly involves the introduc- 
tion of variables not included in the 
theory. The accumulation of empiri- 
cal evidence concerning the perform- 
ance of different groups in any situa- 
tion or attempts to incorporate addi- 
tional variables within any theoreti- 
| cal framework are certainly to be en- 
_ couraged, but statements that the 
| results of such studies refute or con- 
| firm theoretical ex 


posed’ by the theory are met. 


DRIVE AND ANXIETY 
The use of the 


ternative hy- 


l en entertained con- 
cerning the conditions under which 


emotionality is evoked. One is that 
test scores reflect differences jn a 
chronic emotional state so that in- 
dividuals scoring high on the scale 
tend to bring a higher level of emo- 
tionality or anxiety “in the door” 
with them than do Ss scoring at 
lower levels (40). A second alterna- 
tive conception is that MAS scores 
reflect different potentialities for anx- 
iety arousal, high Scoring Ss being 
those who tend to react more emo- 
tionally and adapt less readily to 
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novel or threatening situations than 
do low scorers (28, 37). According 
to the first hypothesis differences 
among anxious and nonanxious groups 
(providing other conditions imposed 
by the theory are met) should be 
found whether or not there is any 
“threat,” in the form of noxious stim- 
ulation, fear of failure or the like, in 
the situation. Thus, for example, the 
Performance of anxious Ss should be 
Superior to the nonanxious in both 
classical defense conditioning, in which 
a Noxious stimulus is employed, and 
in reward conditioning into which no 
objective threat has been introduced. 
In the case of the second conception, 
differences would be expected in the 
performance of anxiety groups only 
in those situations in which some 
threat is present. Should this be the 
correct conception, exact specifica- 
tion of the conditions thought to be 
sufficient to evoke anxiety would be 
necessary in order to test hypotheses 
concerning the role of drive. Avail- 
able evidence Suggests that the mag- 
nitude of differences among groups 
may be related to the level of noxious 
stimulation employed (37), or to 
Stress-producing instructions (10,19), 
Suggesting that differences in drive 
evel among groups may depend at 
least in part upon situational factors. 
However, the Picture is complicated 
by the results of a number of studies 
in which differences among anxiety 
groups have been found in the ab- 
Sence of noxious stimulation or in- 
Structions designed to produce stress 
(8, 24, 25, 26, 42), 

Most investigators have not ex- 
Plicitly Considered this issue, assum- 
ing either that anxiety scores reflect 
a chronic level of emotionality of 
that factors are present in the typical 
laboratory experiment that result in 

ifferent anxiety levels among groups: 

Or purposes of evaluating those stu- 


a- M 
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dies in which degree of stress has not 
been under investigation, the as- 
sumption will tentatively be made 
here that in all situations, individuals 
scoring high and low on the anxiety 
scale will differ in drive level, for 
whatever reason. The evidence more 
directly concerned with the condi- 
tions of anxiety-arousal will be con- 
sidered at a later point. 


EXPERIMENTAL EVIDENCE 


Classical conditioning. Classical 
conditioning is said to be a noncom- 
Petitional situation in which but a 
Single response tendency is being ac- 
quired ; theoretical expectation there- 
fore is that anxious groups will per- 
form at a higher level than nonanx- 
lous. The results of a number of stud- 
les of eyelid conditioning using 
groups with extreme scores on the 
MAS have upheld these predictions, 
anxious Ss showing a greater number 
of CR’s than nonanxious (11, 35, 37, 
38, 39, 40). In all cases but one (11), 


. these differences were statistically sig- 


nificant, the exception involving the 
use of only 10 Ss per group, consider- 
ably fewer than were employed in 
Other investigations. Data from eye- 
lid conditioning studies performed in 
the Iowa laboratories and elsewhere 
(39) are also available from Ss scor- 
ing throughout the entire range of 
anxiety scores rather than only at the 
two extremes. The relationship be- 
tween anxiety and conditioning scores 
has been uniformly found to be mono- 
tonic although not always linear, 
middle-anxiety Ss tending to show a 


* In almost all of the studies involving the 
, a Comparison has been made of extreme 
Scorers, typically the 20th percentile or below 
nonanxious) and 80th percentile or above 
anxious) in terms of a standardization group 
of College students (41), Use of the terms 
anxious” and “nonanxious’ groups here 
should be understood to refer to such ex- 
tremes unless otherwise indicated. 


performance level closer to the low- 
scoring than the high-scoring groups. 
The magnitudes of the correlation 
coefficients obtained have been in the 
neighborhood of .25, thus indicating 
that relatively little of the variance 
among Ss can be accounted for in 
terms of anxiety scores. In view of 
the low correlation and the mono- 
tonic relationship between the two 
variables, continued use of extreme 
groups only for research purposes in ` 
such situations seems justified. 
A conditioning study employing a 
response other than the eyeblink has | 
also been reported in the literature. 
An investigation by Bitterman and 
Holtzman (1) utilized the PGR tech- © 
nique which, like the eyelid situation 
it will be noted, involves defense con- 
ditioning. After dividing a group of 
randomly selected college students | 
into the upper and lower 50% on the - 
basis of MAS scores, these investiga- 
tors found a slight but statistically 
insignificant superiority in condition- 
ing level on the part of their anxious | 
Ss. Since their anxious group in- 
cluded individuals with scores con- 
siderably lower than those in the in- 
vestigations referred to above, this 
lack of statistical significance is not | 
too surprising. . 
Several studies are available con- 
cerning differential conditioning, also | 
in the eyelid situation (11, 34, 36). | 
The predictions derived from the 
theory in this instance are that anx- | 
ious Ss should exhibit a greater excita- 
tory strength both to the positive - 
(reinforced) CS and to the negative 
(nonreinforced) CS and further, that 
the difference in excitatory strengths 
of the two stimuli should be greater | 
for the anxious group. By transform- 
ing all raw data into excitatory | 
| 
| 


strength values, Spence and his col- 
leagues (34, 36) have attempted to 
test these predictions in some five | 
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separate instances. In each case, the 
excitatory strength to the positive CS 
during differential conditioning was 
significantly greater for anxious Ss, 
as was expected. The results con- 
cerning the remaining two predictions 
were not so clear-cut. In four out of 
five independent instances the excita- 
tory strength to the negative stimulus 
was greater for the anxious Ss but in 
no case was the difference significant. 
In all five cases the difference be- 
tween excitatory strengths was in the 
expected direction but was significant 
in only one instance. While the re- 
sults of these studies tend to lend 
some support to the theory, some- 
what contradictory findings have 
been reported by Hilgard, Jones, and 
Kaplan (11). As mentioned earlier, 
contrary to other studies of simple 
eyelid conditioning, these investiga- 
tors found only a slight, Statistically 
insignificant superiority for anxious 
Ss during training to the Positive CS, 
During differential conditioning, the 
anxious group continued to exhibit 
an insignificant superiority to the 
nonanxious on the positive CS. How- 
_ ever, the responses of the anxious Ss 
| to the negative CS were significantly 
greater as would be expected by drive 
theory. 
Stimulus generalization, 
generalization, to which di 
conditioning is related, has been in- 
vestigated more directly by Rosen- 
i baum (28) and Wenar (48). Rosen- 
baum found greater responsiveness to 
generalized stimuli in a spatial situa- 
tion for an anxious group than for a 
nonanxious group, as would - 
dicted by drive theory, but ee 
the case of Ss given strong intermit- 
tent shock during their Performance; 
| for groups given a weak shock or 
buzzer, no significant differences 
emerged. After training groups of 
| anxious and nonanxious Ss on a key- 
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pressing response to a strong shock, 
weak shock or a buzzer presented at 
regular intervals, Wenar (48) meas- 
ured the reaction time to these stim- 
uli in a test series in which the inter- 
vals of presentation were longer or 
shorter (temporal generalization) 
than those employed during training. 
Reaction time was related signifi- 
cantly to both stimulus intensity and 
anxiety level, response time being 
quicker as these variables increased. 
Maze learning. The first study to 
be concerned with demonstrating 
that the relative performance of anx- 
ious and nonanxious Ss is a function 
of degree of interference within 4 
task was reported by Taylor and 
Spence (43), who used a type of serial 
verbal maze. On the assumption 
that errors in such a situation are 
largely the result of interfering Te- 
sponse tendencies, due to remote a5- 
sociations, etc., it was expected that 
anxious Ss would make more errors 
and take more trials to reach a cri- 
terion than nonanxious. The results 
of this study and of a subsequent 10- 
vestigation by Farber and Spence 
(8) with a stylus maze have confirme 
these hypotheses, the greater number 
of errors and trials to criterion being 
made by the anxious groups. An ad- 
ditional prediction was also made for 
these maze data, namely that the de- 
Sree of inferiority of the anxious Ss 
in comparison to the nonanxious 
should be Positively related to diffi- 
culty of the choice point. In both 
Studies, significant rank-order Cor- 
relations were obtained between the 
difference in number of errors be- 
tween groups on an individual choice 
Point and the difficulty of that point. 
Although these results tend to con- 
Tm theoretical expectation, some 
discrepancy between prediction and 
the experimental findings occurre 
on the easiest choice points. In each 
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investigation, the small number of 
errors on the easiest two or three 
points suggests the presence of few 
interfering tendencies so that the 
anxious might be expected to be su- 
perior in performance. Even here, 
however, they tended to be inferior. 
_ In addition to the two studies uti- 
lizing extreme groups, one study of 
stylus maze learning involving the 
entire range of anxiety scores has 
been reported. After splitting a ran- 
domly selected group of college stu- 
dents into 7 anxiety groups according 
to their MAS scores, Matarazzo et al. 
(24) found a linear relationship 
(r=.25) between anxiety and trials 
to the criterion on the maze. 

While the investigations reported 
above have found differences between 
anxiety groups on maze performance, 
Hughes, Sprague, and Bendig (14), 
Utilizing extreme groups, failed to 
duplicate these results with several 
serial verbal mazes. Different from 
the Taylor and Spence study in which 
the typical 2-second rate of stimulus 
Presentation was employed, Hughes 
et al. used a 4-second rate in all cases. 
Previous investigations have demon- 
Strated (12) that performance is posi- 
tively related to the interstimulus in- 
terval in serial learning but since the 
effects of this variable are poorly un- 
derstood, the implications of the fail- 
ure to find differences between anx- 
iety groups with the 4-second condi- 
tion are not clear. One possibility, 
based on the assumption that differ- 
ences in anxiety level are largely 
determined by situational factors, is 
that under longer time intervals, 
stress upon Ss, and hence upon dif- 
ferences in emotionality between 
anxious and nonanxious, is mini- 
mized. 

Verbal learning. Rather than at- 
tempting to demonstrate an interac- 
tion between anxiety level and degree 


of interference by examining individ- 
ual items within a single task, as was 
done in the maze studies, Montague 
(25) formed three different lists of 
serial nonsense syllables which, be- 
cause of varying degrees of formal 
intralist similarity and association 
value of the syllables, presumably dif- 
fered in the amount of intralist inter- 
ference. A significant interaction was 
found between anxiety and list, an 
anxious group being significantly su- 
perior in performance to nonanxious 
on the list for which similarity was 
low and association value high, and 
the position being reversed for groups 
given a list of high similarity and low 
association value. Similar findings 
have been reported by Lucas (19) in 
a study in which Ss were asked to re- 
call lists of consonants read to them. 
As the number of duplicated conso- 
nants within a list was increased, anx- 
ious Ss showed a significant decrease 
in the amount recalled while the per- 
formance of the nonanxious was not 
affected. 

While a number of investigators 
have employed serial learning tasks, 
from the point of view of testing the 
implications of drive theory, the 
paired-associate technique seems to 
be preferable. Whereas intralist in- 
terferences due to such factors as re- 
mote associations are inherently part 
of serial learning and are thus difficult 
to manipulate, the use of discrete 
S-R pairs permits more precise control 
of the number and strength of the 
response tendencies elicited by each 
stimulus. Turning to the investiga- 
tions that have employed this paired- 
associate arrangement, several stud- 
ies have attempted to minimize the 
presence of competing response mi - 
encies and thus to demonstrate the 
performance superiority of anxious 

d Chapman 
Ss. In one, Taylor an “itt 
(42) chose nonsense syllables WI) 
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low formal similarity, in an attempt 
to provide a noncompetitional ar- 
rangement in which each stimulus 
tended to evoke only its own re- 
sponse. As expected, on two lists for 
which such low similarity obtained, 
anxious Ss were significantly superior 
in performance to nonanxious. Simi- 
lar superiority of anxious Ss has been 
reported by Spence (33) on an adjec- 
tive list in which the association be- 
tween each S-R pair was presumed to 
be initially strong and minimum sim- 
ilarity existed among pairs. In a sec- 
ond part of this investigation, an at- 
tempt was made to maximize the 
number of competing tendencies by 
having a high degree of synonymity 
among stimuli. As predicted, an anx- 

_ ious group in this case was inferior. 
The initial strength of association 
between S-R was also manipulated 
by Ramond (26) in an investigation 
_ involving a variation of the standard 
technique, 
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superiority as the weak responses are 
learned and provide competition. 
The results lent some support to these 
expectations, anxious S first being 
superior and then inferior to nonanx- 
ious although the over-all difference 
between groups did not reach statisti- 
cal significance. 


ANXIETY SCORES AND THEIR 
RELATIONSHIP TO STRESS 


As was indicated earlier, two alter- 
native hypotheses have been enter- 
tained concerning the difference be- 
tween Ss scoring high and low on the 
MAS with respect to anxiety: that 
such groups have different levels of 
chronic anxiety or that the groups 
instead differ in their emotional 
reactiveness to anxiety-evoking stim- 
uli present in a situation. È 

The studies of verbal learning just 
discussed indicate that whether due 
to chronic or situational factors, dif- 
ferences between high and low scor- 
ing Ss cannot be said to be produced 
only when stress is deliberately 1n- 
troduced into the situation, either by 
means of noxious stimulation as in 
the case of defense conditioning or by 
the administration of stress-provok- 
ing instructions (e.g., reports of fail- 
ure). Consideration of the studies 
into which some threatening stimula- 
tion has been introduced may, how- 
ever, throw some light onto the ques- 
tion as to whether differences in anx- 
lety among groups could depend, at 
least in Part, on situational variables. 

Should situational factors play a 
role in determining differences in 
emotionality among anxiety groups, 
the strength of the UCS in classical 
conditioning might be expected to be 
telated to such group differences. 
comparison of three experiments of 
eyelid Conditioning from the Iowa 
aboratory involving a relatively 
Strong, medium, and mild UCS, re- 


ax.” 


DRIVE THEORY AND MANIFEST ANXIETY 


spectively, was made by Spence and 
Farber (35). Examination of the 
mean conditioning scores reveals that 
while intensity of the UCS tended to 
be related to performance, the magni- 
tude of the difference between anx- 
ious and nonanxious remained rela- 
tively constant under the different 
intensities. Different results were ob- 
tained by Spence and his associates 
(37) in a study specifically under- 
taken to evaluate the effect of the 
Strength of noxious stimulation on 
anxiety groups. In this investigation 
the Ss, selected without reference to 
their anxiety scores, were conditioned 
with a relatively weak UCS, but one 
group was given occasional electric 
shocks between trials, another threat- 
ened with shock, and a third trained 
under neutral conditions. These lat- 
ter Ss, run under neutral conditions, 
gave fewer CR’s than the other 
groups, especially in earlier trials. 

hen Ss were later divided into the 
Upper and lower 50 per cent accord- 
ing to anxiety scores, it was found 
that while the high-scoring group 
Conditioned without shock or threat 
of shock exhibited only a slight, sta- 
Ustically insignificant superiority in 
Conditioning performance, the differ- 
ence between anxiety groups was 
highly significant for Ss with whom 
Shock or threat of shock was em- 
Ployed. 

The previously mentioned studies 
of stimulus generalization by Rosen- 
baum (28) and Wenar (48) were also 
Concerned with variations in the in- 
tensity of noxious stimulation, in 

oth cases a buzzer and two intensi- 
ties of shock being employed. While 

osenbaum found a significant dif- 
ference between groups only when 
Strong shock was used, Wenar’s re- 
Sults (with a somewhat different ex- 
Perimental arrangement) indicated a 
&teater responsiveness for the anx- 
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ious group under all three conditions. 
Furthermore, the magnitude of the 
difference between groups was unaf- 
fected by stimulus intensity. 

Turning to verbal learning, Deese, 
Lazarus, and Keenan (4) have re- 
ported a study in which the effect of 
electric shock on serial learning was 
investigated. Here it was found that 
nonanxious groups given intermittent 
shocks performed at a significantly 
lower level than a nonanxious control 
group run under neutral conditions. 
In contrast, the performance of the 
anxious groups remained relatively 
constant, Ss run under shock not dif- 
fering from their control group. Fur- 
ther, when all conditions were com- 
bined, the performance of the anxious 
was significantly superior to the non- 
anxious. Thus, while the differences 
between groups increased under 
shock, they were due to the disrup- 
tive effect of the shock on the zon- 
anxious Ss. 

Quite in contrast to the results of 
Deese et al. are the findings obtained 


3 Although, presumably, the serial list was 
of relatively low intralist similarity, it is 
difficult to tell from the writers’ description 
what drive theory would have predicted 
concerning the performance of the anxiety 
groups, independent of the stress factor. Ina 
second, parallel, experiment involving a 
more difficult list (12 consonant syllables 
composed of only 5 consonants) presented for 
a standard 12 trials, Lazarus, Deese, and 
Hamilton (17) found no differences among 
groups either as a function of anxiety scores 
or of shock-no-shock conditions. While 
these results appear superficially to be con- 
tradictory both to drive theory (which would 
expect inferiority of anxious Ss) and to the 
results of the first study with respect to the 
influence of shock, inspection of their data 
indicates that all groups averaged only about 
one correct response per trial. Since so little 
learning took place it is not surprising to have 
no differences in performance among groups. 
For this reason it is felt that the study does 
not provide very meaningful evidence on the 
effects of either anxiety level or shock on task 
performance. 
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by Gordon and Berlyne (10) in an 
investigation of verbal learning utiliz- 
ing psychological stress rather than 
noxious stimulation. After being told 
that the tasks were measures of intel- 
ligence and that their performance 
on a paired-associate list was above 
average, anxious and nonanxious 
groups did not differ significantly in 
amount of negative transfer on a sec- 
ond paired-associate list. An anxious 
group told that their first list per- 
formance was below average, how- 
ever, exhibited significantly more 
negative transfer than did a compara- 
ble nonanxious group. Finally, in 
the Lucas study (19) mentioned ear- 
lier in which the recall of consonants 
lists varying in number of duplica- 
tions was investigated, the effects of 
varying numbers of reports of failure 
to meet expected standards were also 
studied. While nonanxious Ss in- 
creased the amount recalled with 
greater numbers of failure experi- 
ences, the anxious groups did signifi- 
cantly worse, 

available evi- 
t a clear-cut 
the effects of 


defined by telling S h 
achieve adequate st 
intelligence test) have revealed some- 
what different relationships. In both 
instances (10, 19) the performance 
of anxious Ss under stress was sig- 
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nificantly worse than the anxious 
group tested under neutral condi- 
tions while the performance of non- 
anxious Ss was in one case the same 
and in the second better than the 
control group. Thus, the magnitude 
of the difference between anxiety 
groups was greater under stress than 
under neutral conditions. 

The available evidence suggests 
then that situational sources of stress 
may play a role in determining the 
difference in anxiety level between Ss 
scoring at the extremes of the MAS. 
Whether the differences between 
groups in the verbal learning studies 
into which no objective stress had 
been introduced by the experimenter 
reflect chronic anxiety level or uni- 
dentified sources of threat remains 
an open question. Speculating 0” 
this point, to many college soph- 
omores psychology experiments pe" 
se may be seen as somewhat threaten- 
ing, Particularly when the task coul 
be interpreted as reflecting on their 
personality or intelligence. It is per- 
fectly possible that in experimenta 
arrangements involving no noxious 
stimulation or stress-inducing i^- 
structions which call upon skills not 
particularly valued by college stu- 
dents, differences between groups 
might disappear.‘ 3 
_ Using the results of these studies 
involving stress to attempt to deter- 
mine the source of anxiety differences 
between high- and low-scoring Ss of 
for that matter, to test drive theory: 
involves the assumption that the only 
effect of stress in any situation is tO 
increase drive level or, at least, that 


"A Study of classical reward conditioning of 
e salivary response by Bindra, Paterson, 
and Strzelecki (On the relation between anx- 
ety and conditioning, Canad. J. Psychol., 1955» 
9, 1-6) which appeared after this review Wa 
written confirms this suggestion. No difference 


was found between anxious and nonanxious 
groups. 
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anxious and nonanxious groups do 
not respond differentially to stress 


except with respect to anxiety or 


drive. Although no systematic ex- 
ploration has been made of the rela- 
tionship between degree of noxious 
stimulation and performance on vari- 
ous types of tasks, an examination of 
the general literature concerning the 
effect of such stimulation in nonver- 
bal, noncompetition situations lends 
some credibility to this assumption 
(32). It is important to note that 
with one exception (4) the studies of 
the effects of noxious stimuli on anx- 
ious and nonanxious Ss have em- 
ployed tasks of this type. 
_ In contrast, the literature concern- 
ing studies of psychological stress 
(e.g., ego-involving instructions, re- 
Ports of failure), most of which have 
employed quite complex tasks, sug- 
gests that factors other than or in 
addition to drive level are involved. 
The variety of roles or effects that 
stress may have in addition to the 
Motivational one has been discussed 
by Lazarus, Deese, and Osler (18) 
and more recently by Farber (7). 
‘articularly pertinent to the present 
discussion is the finding that there 
are wide individual differences in 
response to such stress, some individ- 
uals improving in performance, others 
decreasing, and still others being un- 
affected. The direction of the effect 
of stress has further been related to 
ee personality variables (18). 
he Ss scoring at the extremes of the 
MAS continuum may react to such 
Stress with characteristically differ- 
ent patterns as well. Thus, it is possi- 
le that with increasing degrees of 
Stress, differences between anxious 
and nonanxious other than drive may 
aroused and become responsible, 
at least in part, for the discrepancy 
between the performance levels of 
such groups. 
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Unfortunately, the two available 
studies involving psychological stress 
do not permit an evaluation of this 
suggestion (nor of the possibility that 
stress of any type, physical or psycho- 
logical, may have a similar effect in 
tasks of sufficient complexity). Both, 
it will be recalled, used learning tasks 
of such a type that an increase in 
drive level might be expected to re- 
sult in deterioration of performance. 
Thus, it could be argued that the anx- 
ious were “threatened” (had their 
drive level increased) by the stress 
instructions and hence deteriorated in 
performance in comparison to their 
neutral control group while the fact 
that the nonanxious under stress did 
not show a similar inferiority merely 
indicates that they were emotionally 
unaffected by the stress conditions. 
The only hint that more might be in- 
volved than drive level is contained 
in the Lucas study in which non- 
anxious improved with a greater 
number of failure experiences while 
the anxious became worse. Such a 
finding suggests further that these 
additional factors, if any, might act 
in the direction of interfering with the 
performance of anxious Ss and of 
facilitating the performance of non- 
anxious. Additional research upon 
the effects of stress on anxiety groups, 
particularly with tasks of different 
levels of complexity is certainly 
needed to provide information about 
these possibilities. 

The suggestion that at least psy- 
chological stress may have other than 
drive effects on anxious and nonanx- 
ious Ss in complex tasks bears some 
resemblance to the empirical predic- 
tions proposed by Sarason and Man- 
dler and their associates (22, 23, 29) 
for the performance of groups selected 
by a different measuring instrument, 
a questionnaire of “test anxiety,” 
designed to select individuals react- 
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ing with different degrees of anxiety 
to intelligence tests and course ex- 
aminations. These investigators hy- 
pothesized that such high-anxious 
individuals react to an experimental 
situation represented as a test of in- 
telligence or the like (thus, according 
to their conception, creating stress) 
not only with more anxiety or drive 
than low-anxious but also, as a result 
of past learning, have evoked by 
their anxiety irrelevant response tend- 
encies which interfere with task per- 
formance. Under increasing stress 
(such as reports of failure) the per- 
formance of high-anxious Ss worsens 
because of the arousal of a greater 
number of these irrelevant tenden- 
cies, offsetting the facilitating effects 
of drive; the performance of the low- 
anxious, however, improves with 
greater stress due to an increasing 
drive level, unaccompanied by irrele. 


sponse patterns jin addition 
only for high i 
suggestion of 
that additional fac 
ited under stress for b 
tremes although their 
formance may be in 
direction. 

Although Sarason and 
leagues have confined th 
to “test anxiety” 
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primarily, on intelligence-test items 
under stressful conditions, Child (3) 
has proposed that all the work done 
with Ss scoring at the extremes on 
the MAS, independent of whether 
stress is introduced, could be more 
plausibly explained by such an inter- 
ference theory. These task-irrelevant 
responses are always present in anx- 
ious Ss, as well as a higher drive level, 
Child states, but they disrupt perg 
formance only in complex situations 
“where the subject is already in con- 
flict between various response tend- 
encies relevant to the task [so that] 
the presence of irrelevant response 
tendencies heightens the conflict and 
interferes with performance to a 
greater extent than increased drive 
improves it” (3, p. 154). 

It would appear to the present 
Writer that a theory that attempts t° 
attribute all inferiority of perform- 
ance to irrelevant tendencies wou 
either be forced to predict that anx- 
ious Ss would always be inferior tO 
nonanxious in such complex tasks a5 
verbal learning (since it seems har 
to maintain that even with verba 
materials having little intratask in- 
terference, irrelevant extratask Te 
Sponses could not interfere with per- 
ormance) or, if already obtained re- 
sults are to be explained, that anxiety 
level and its correlated irrelevant re- 
SPonse tendencies would shift up an! 

wn abruptly from task to task an 
even from stimulus to stimulus within 
a task as the number of competing 
response tendencies directly elicited 

Y a stimulus varied. Tieing the 
number of extratask responses to the 
number of intratask interferences 
would seem merely to be adding on 
More variable to those considered by 

rive theory without making differ- 
ent predictions in the situations tO 
which drive theory has been thought 
to be applicable, 
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It is interesting to note that the 
suggestions being proposed here con- 
cerning the possible role of response 
as well as drive differences in the 
performance of anxious and nonanx- 
ious Ss in stress situations leads to a 
different prediction then do Child’s 
hypotheses in certain cases. Accord- 
ing to the present writer, on verbal 
tasks in which anxious Ss are demon- 
strated to be superior to nonanxious 
under neutral conditions, the intro- 
duction of stress might be expected 
to minimize this difference between 
groups or even to reverse its direc- 
tion, the performance of anxious 
Ss being lower than under neutral 
conditions and the nonanxious possi- 
bly being higher. Child, while per- 
haps also expecting nonanxious Ss to 
be better under stress than under 
neutral conditions, would be forced 
to predict that an anxious group 
under stress would be the same as or 
even superior to its neutral control 
group rather than worse. That is, the 
fact that under neutral conditions the 
anxious Ss perform at a higher level 
than nonanxious would indicate, 
according to Child, that this was a 
Situation in which making irrelevant 
responses does not interfere with task 
Performance, the difference between 
groups in favor of the anxious being 
due, then, to their higher drive. 

hile stress might increase the drive 
level of anxious Ss and hence the 
Magnitude or number of the task- 
irrelevant responses, these latter 
would still not compete with task- 
relevant responses since the task is 
the same. 

Still another interpretation of the 
relationship between anxiety and 
Stress has been suggested, the pre- 
dictions of which are quite opposed 
to any of those previously discussed. 
On the basis of their findings with 
serial learning that the performance 
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of nonanxious groups deteriorated 
with shock while that for the anxious 
did not, Deese, Lazarus, and Keenan 
(4) suggested that the MAS measures 
not so much anxiety as how indi- 
viduals defend themselves against 
anxiety, and further, that MAS scores 
are related to the hysteria-psychas- 
thenia continuum. The latter pro- 
posal arose from the finding that 
(with overlapping items excluded) 
there was a positive correlation of 
.40 between the MAS and the Psy- 
chasthenia (Pf) scale on the MMPI 
and a —.23 correlation between the 
MAS and the Hysteria (Hy) scale. 
By assuming that nonanxious Ss are 
hysterical individuals who are unable 
to maintain their defenses in the face 
of objective inescapable stress (e.g., 
shock, as opposed to psychological 
stress), and therefore are greatly dis- 
turbed by it while the anxious are 
psychasthenic and therefore react to 
objective threat coolly and intellectu- 
ally, they believe their results become 
intelligible. The same explanation 
has been offered by Eriksen (5), who 
found that Ss scoring high on the Hy 
scale exhibited more stimulus gen- 
eralization in an investigation involv- 
ing shock than did high Pt Ss. These 
results, Eriksen stated, were inex- 
plicable in terms of drive theory. In 
attempting to evaluate these hy- 
potheses (and leaving aside any ques- 


tions of the clinical validity of the — 


various measures employed) it might 
be well to inject a historical note. In 
developing a scale for the selection of 
Ss, the present writer deliberately 
attempted to include items descrip- 
tive of overt or manifest anxiety and 
avoided including items describing 
behavior not itself “anxious” but 
said to be a defense against an in- 
ternal anxiety precisely because it 
was the purpose of the scale to select 
Ss differing in functioning anxiety 
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level in the experimental situation; 
to the extent that defenses were effec- 
tive in keeping anxiety at a mini- 
mum, inclusion of “defense items” 
on the scale would have been self- 
defeating. 

The conflict between the hypothe- 
sis of Eriksen, Deese, eż al., and the 
assumptions made by drive theorists 
in using the MAS is not whether some 
individuals scoring low on the scale 
are potentially anxious individuals 
with good defenses, but rather 
whether the introduction of special 
conditions such as shock so affect a 
sufficient number of low scoring Ss 
as to wipe out or reverse the direc- 
tion of difference in drive or emo- 
tionality between low- and high-scor- 
ing groups that exists under neutral 
conditions. If Ss are thus affected, 
drive theorists must either abandon 
the MAS for a different selective in- 
strument, or restrict themselves to 
testing groups in situations in which 
defenses are assumed to be Operating, 


f the available 


| evelopment of a 
comprehensive theory of anxiety as a 


personality phenomenon), 
the results of De 
deviate; no othe 
volving noxious 
psychological stre 
hysterical defense 
sults that would 
anxiety level of | 
creased up to or beyond that of the 
high scoring Ss, 

has any differential effec 
appears to be in the dir 
creasing the anxiety 


t at all, it 
ection of in- 
of the anxious 


group proportionately more than the 
nonanxious. Examining the Eriksen 
results and accepting them as reli- 
able, there seems to be no firm basis 
for suggesting that drive theory 
would have predicted more stimulus 
generalization for the high Pi group 
than the high Hy. Such a claim rests 
on the assumption that all nonanx- 
ious Ss would be low Hy and all anx- 
ious high Pt. The magnitude of the 
reported correlation coefficients, par- 
ticularly between the MAS and the 
hysteria scale does not make this 
assumption seem too reasonable. 
Even if high Hy Ss do become dis- 
turbed under nonescapable stress, 4 
sufficient number of Ss could remain 
in the nonanxious group who were 
“genuinely” nonanxious, or whose de- 
fenses remained intact, to have a non- 
anxious group exhibit less stimulus 
generalization than the anxious. 
More relevant than such armchair 
argument, however, are Rosenbaum i 
(28) results. Using an experimenta 
arrangement very similar to Erik- 
sen’s, Rosenbaum found, it will be 
recalled, more stimulus generaliza- 
tion for anxious than nonanxious; 
and even more important, that the 
difference between groups was aps 
nificant only under the conditions 0 
strong shock. 


MAS anD CLINICAL MEASURES 
OF ANXIETY 

As was indicated earlier, the mean- 
8g of the term “anxiety” as used, ae 
the studies attempting to determine 
the relationship between drive an 

Performance has been only in terms 
of MAS Scores. While such pure 
operationism js methodologically 
Sound, the generality of these results 
would be considerably expanded were 
a relationship established hee 
the MAS and more common clinica 
definitions of anxiety. Most valuable 


in 
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would seem to be a comparison of 
scale scores with observers’ ratings 


. of overt behavior since other diag- 


nostic tests of anxiety are themselves 
purported to be indicators of such 
behavior. Fortunately, several stud- 
ies relating MAS scores and obser- 
vational data have been carried out. 
In the first of these investigations, 
reported by Gleser and Ulett (9) of 
Washington University, a psychia- 
trist rated 151 normal individuals 
and 40 psychiatric patients with 
Overt anxiety as a prominent symp- 
tom after an hour interview with each 
subject. Ratings were made on an 8- 
Point scale of anxiety-proneness, de- 
fined as the tendency for overt anx- 
lety symptoms to appear in a stressful 
Situation. For the total group the 
Correlation between these ratings 
and MAS scores was .61. Other simi- 
lar studies by the Washington group 
(45, 46) with more restricted samples 
indicated lower coefficients. In a 
study of 110 male students, involving 
the judgments of two psychiatrists, 
the ratings correlated .28 and .29 
with MAS scores for the two raters, 
while the interjudge reliability was 
-28 (46). All correlations were sig- 
nificant. Lastly the Washington 
Sroup reported a coefficient of .40 be- 
oo the ratings of a single psychia- 
Tist and anxiety scores for 141 nor- 
mal Ss (45), 
tog in a student-counseling- 
er setting, Hoyt and Magoon 
ns asked experienced counselors to 
e their own clients (NW =289) into 
at of three groups: high, medium, 
i ow manifest anxiety. Comparing 
a ea MAS scores for each of the 
ika ting „anxiety groups, an ex- 
p Rely significant chi square was 
ound, while the contingency coeffi- 
cent, used as an estimate of the 7 to be 
expected if the variable had been 
continuous, was .47. Using a still dif- 


ferent criterion of clinical anxiety, 
Kendall (16) had pairs of nurses rate 
TB patients on their ward on a 7- 
point rating scale for each of nine 
aspects of manifest anxiety. Selecting 
from the 93 patients so rated the up- 
per and lower 27% in terms of MAS 
scores, Kendall compared the differ- 
ence in mean over-all anxiety ratings 
for the two groups and found it to be 
statistically insignificant; taking only 
the upper and lower 13% on the 
MAS, a very significant ¢ between 
mean ratings was obtained. 

Finally, a study by Buss, Wiener, 
Durkee, and Baer (2) represents one 
of the few investigations utilizing 
hospitalized psychiatric patients. 
Each of their 64 patients was inter- 
viewed and then rated by four psy- 
chologists on nine aspects of directly 
observed and reported anxiety. Cor- 
relations between judges’ pooled rat- 
ings and MAS scores ranged between 
.16 to .68 for these various aspects; 
the correlation with an over-all rating 
of anxiety was .60. 

The variation in the training of the 
raters, opportunity for observation, 
rating scales, and populations from 
which the subjects were drawn makes 
it difficult to formulate any statement 
about the “validity” of the MAS. To 
the extent that all of these observa- 
tional criteria are themselves cor- 
related and are agreed to be clinically 
acceptable indices of manifest anx- 
iety, there does seem to be some rela- 
tionship between MAS and observed 
behavior. These results suggest, 
then, that the experimental results 
obtained with the anxiety scale might 
also hold for groups selected accord- 
ing to clinical criteria. Such studies 
as have been reported about the per- 
formance of clinically selected anx- 
ious groups on comparable tasks tend 
to confirm this suggestion (1, 20, 30, 
47). 
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In addition to the experimental 
studies of the performance of anxious 
and nonanxious groups already dis- 
cussed, a number of other investiga- 
tions have reported differences in the 
behavior of anxious and nonanxious 
Ss, ranging from indications of num- 
ber of food aversions (31) to per- 
formance in problem-solving tasks 
(21, 49). The exclusion of these many 
experiments from consideration here, 
due to the limited purpose of this 
paper—that of assessing the evidence 
directly relevant to drive theory— 
points up what has not always been 
fully appreciated about this theory. 
It is an extremely restricted one, re- 
ferring only to the effects of drive 
level (rather than all characteristics 
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of anxious and nonanxious individ- 
uals) in relatively simple ere 
situations. The major prediction o 
the theory, that there is an iera 
tion between anxiety level and an 
complexity, seems to be fairly bie 
substantiated by experimental a 
dence, although more exact weer 
tions have either not been tested a 
yet or have not fared as we ; 
Whether the theory can be success” 
fully applied to more complex situa- 
tions than those for which it oniki 
nally seemed appropriate, as po 
have attempted to do, or whet 2 
additional variables can be added : 
it and thus broaden its usefulness re 


s ters 
mains for future research to de 
mine. 
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BEHAVIORAL EFFECTS OF IONIZING RADIATIONS 


ERNEST FURCHTGOTT! 
University of Tennessee 


Psychology has not been by- 
Passed in the current general interest 
in ionizing radiations. Since World 
War II a number of laboratories 
Maintained by the U.S. government 
have conducted research in this area. 
In addition, several research projects 
have been sponsored by government 
agencies in non-Federal institutions. 
On March 31, 1955, there were in 
Progress no less than seven such 
Separate projects having no security 
classification (50). All of this activity 
Would seem to warrant a brief review 
of the problem. 


The Stimuli 


The biological effects of high en- 
ergy radiations are ascribable pri- 
marily to changes brought about in 
Cells by ionization, defined as the 
removal of electrons from atoms. 

different types of radiations produce 

lological effects differing primarily 
quantitatively, rather than qualita- 
tively. Two general classes of radia- 
tions may be distinguished. 

1. Material radiations consist of 
Streams of particles which transfer 

ir kinetic energy to the targets 
Which they strike. The particles dif- 
€ring in mass and/or electrical 
charge are neutrons, alpha particles, 
electrons (beta particles), deuterons, or 
Protons. These radiations have been 
Utilized only very rarely in behavioral 
Studies. 

_ 2. Electromagnetic radiations con- 
Sist of oscillating electric and mag- 
* I wish to express my gratitude to Dr. S. R. 


Tipton of the U.T. Department of Zoology 


or critically reading portions of the manu- 
Script, 


netic fields. They do display also 
corpuscular (photon) properties. Psy- 
chologists are familiar with “light” 
rays which lie in the frequency range 
of 10% cycles per second (wave- 
length range 9X10-°—4X 10-5 cm.). 
Radiations above 10" cycles per sec- 
ond are capable of ejecting inner 
electrons from atoms. Radiations in 
the 1018—10% cycles per second range 
(10-®—10-!° cm.) are called X rays; 
those between 10!9— 10%? cycles per 
second (10-°—10-" cm.) gamma rays 
(the latter are usually produced by 
oscillating currents within the atomic 
nuclei themselves). Gamma rays 
often accompany the disintegration 
of radioactive substances. 

The relative biological effectiveness 
of various radiations is a function not 
only of the total number of ions 
formed, but also of the spatial dis- 
tribution of the’ions in the tissues. 
The terms linear ion density or linear 
energy transfer are used to express 
the relative density of ionization 
per unit length of tissue. Beta and 
gamma rays produce 6.3-11 ions 
per micron of tissue, 1,000 kv. X rays 
approximately 15, 200 kv. X rays 80 
and lower voltage X rays a still 
higher number. Ionization following 
neutron radiations produces up to 
9,000 ions per micron of tissue (26, 
p. 118). Biological effectiveness of 
radiation increases, decreases, or is 
independent of linear energy transfer. 
Thus some activities are affected 
more by alpha particles than by 
gamma rays, while in other functions 
the reverse may be the case. In mam- 
mals we usually find that effective- 
ness increases with ion density. 
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Measurement of Radiations 

Ideally we would like to measure 
the actual amount of ionization in 
tissues, but this is not possible. We 
must be satisfied with specifying the 
physical characteristics of the source 
and the target. Ionization of air is 
actually approximated for ionization 
in tissues. In the case of X and 
gamma rays the unit roentgen, r., is 
defined as that quantity or dose of 
X or gamma radiation which pro- 
duces in 0.001293 g, of air one elec- 
trostatic unit of ions (37, p. 90). 

In the case of material radiations 
a different unit is used, the roentgen 
equivalent physical, rep, which is 
“that quantity of ionizing radiation 
which will produce 1.6X10" ion 
pairs per gram of tissue” (37, p. 436). 
Occasionally, the roentgen equivalent 
man, rem, unit is used which is that 
“quantity of radiation which when 
absorbed by man Produces an effect 
equivalent to that Produced by ab- 
sorption of one roentgen of X or 
gamma radiation” (37, p, 436). 

A few values will be cited to make 
the roentgen unit more meaningful. 
The safe human daily whole-body 

between 0.05 

50:25 r. per day (37, p. 436; 64, p. 
The threshold for the mitotic 
effect in the grasshopper is 8.0 ba 
(64, p. 89). The 30-day 50 per cent 
lethal dose after 100-250 ky, X-ray 
whole-body exposure is about 315 r, 


for the dog, around 500 r. fo; 
(55, p. 930). nae 


GENERAL PRINCIPLES 
OF RADIOBIoLoGy 
It was pointed out Previously that 
‘radiation-induced effects result pri- 
marily from ionization Producing 
physicochemical changes in the liv- 
ing cells. Two general theories con- 
cerning the mode of action of radia- 
tion have been put forward. Accord- 


ing to the target theory certain mole- 
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cules of the cell are especially radio- 
sensitive and it is the change in these 
specific parts which accounts for the 
observed radiation effects. „Opposing 
theorists suggest that radiation a 
fects the cell as a whole by releasing 
certain chemical agents which inter- 
fere with the normal cell metabolti l 
Actually there is evidence SUPE aai K 
ing both viewpoints. For purpo io 
of this review it is not necessary 
examine this problem any ae 
The following variables aroja 4 
portant in the study of radiatio 
fects: (55 4 
1; es. In most ina 
effects are directly related to the th 
2. Rate of delivery or dosage C 
of doses accumulated over ape 
of time). In most cases effective de 
of a given dose decreases with avery 
crease in rate of exposure. Reco ple, 
may account for this. For ba 
in the monkey a single dose of duces 
r. applied to the spinal cord ro00 r. 
Paraplegia, but two daily 5, are 
doses or five daily 3,000 r. doses 
required (48). in 
3. Type of radiation. Usually oy 
mammals effectiveness is et o 
related to the specific ion densi 
the radiation. es 
4. Manner of exposure. a re 
to total-body irradiation are se- 
ent from those in which only 4 ex- 
lected part of the organism ae o 
posed. Shielding of certain party 
the body (spleen, extremities, o 
can decrease the effecti ver ae 
total-body exposure. This is ft e 
cially important in the study © os 
effects on the c.n.s. since doses Tey 
than the median lethal toral i e 
dose are necessary for changes tO 
apparent. 
5. Time after exposure that iy, 
tions are made. Many of the pee { 
logical effects exhibit latencies. nie 
may be of varying order of TER A l 
tudes ranging from seconds to yea! M 


y 


a 
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6. Species differences. The 30-day 
LDs for total-body X irradiation for 
the rabbit is approximately 800 r., 
for the guinea pig 200-400 r., rat 
600-700 r., monkey 500 r. (55, p- 
930). 

1. Sex differences and individual 
differences within the same species. For 
example, the same dose of X rays 
kills more male than female mice, 
but affects the weight of females to a 
greater extent (7). 

_8. Conditions of the organism. Con- 
ditions which may be called ‘‘stress,” 
ie., deviation from normal resting 
State, usually enhance effectiveness 
of radiations. Vitamin deficiencies, 
infections, low temperatures in un- 
acclimated animals, exhaustive ex- 
ercises, adrenalectomies all seem to 
Increase radiation effects. 

K 9. Drugs and anoxia. Certain 
rugs like cysteine, glutathione, alco- 
ol, and anoxia actually depress 

radiation effects. 

eres Reproductive activity of the tis- 

Tak As early as 1906 Bergonié and 

lita ondeau, hypothesized that pro- 

i RNE, tissues are usually most 

a losensitive. We find, for example, 
at while the nervous system of 

put organisms is relatively radio- 

A Sistant the embryonic neurons are 
xtremely radiosensitive. 

eastliation sensitivity varies con- 

5 pacar from tissue to tissue. For 

Py etailed discussion the reader may 

ae the radiation literature. We 

eff, mention here only a few of the 
ffects, of interest to the psycholo- 
gist. 

tabe hematopoietic system is ex- 

A nid radiosensitive. A decrease 

e number of circulating lympho- 
ytes is one of the most sensitive in- 
\cators of radiation overexposure. 
ther blood components also show 


* Dose required to kill 50 per cent of the 


animals duri P rR 
period, ring the first 30-day postirradiation 
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pathological changes. Hemorrhagic 
manifestations are also quite com- 
mon after acute irradiation. Vascu- 
lar changes are major contributors 
to the brain pathologies observed 
after large doses of irradiation (9). 
Generalized circulatory changes are 
only minor after median lethal doses, 
but with larger doses the effects are 
more pronounced (56). 

There is some disturbance in water 
metabolism. Several studies have 
reported changes in water intake 
after X irradiation (15, 53, 54). 

The gastrointestinal tract is ex- 
tremely radiosensitive. Anorexia, 
nausea, and vomiting are among the 
clinical symptoms of radiation sick- 
ness (overexposure to radiation). 
Depression of food intake and a loss 
of body weight can be observed in ir- 
radiated animals. The magnitude 
and duration of the depression are a 
function of the dosage (63). Loss of 
body weight can be thus used as an 
indication of radiation sickness. 

The endocrine glands, except for 
the gonads, are relatively resistant 
to radiation damage. However, radi- 
ations act as ‘‘stressors”’ and they 
give rise to the well-known pituitary- 
adrenocortical stress response (56). 

The cornea, conjunctiva, and the 
lens of the eye are also quite radio- 
sensitive but the latency of human 
radiation cataracts may be measured 
in terms of years (56). 

Muscle is very resistant to radia- 
tion. The nervous system is also rela- 
tively radioresistant. Both will be 
considered in greater detail further 
on. 


PRE- AND NEONATAL RADIATION 


An excellent review of the effects 
of prenatal irradiation has been writ- 
ten by L. B. Russell (61). 

One of the crucial variables in pre- 
natal irradiation is the stage at which 
exposure occurs. Russell (61) divides 
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the mammalian gestation period into 
three stages; preimplantation, major 
organogenesis, period of the fetus. 
In the rat these periods correspond 
to the following postconception days: 
0-7, 8-15, 16-term. During the pre- 
implantation period radiation pro- 
duces a high percentage of prenatal 
deaths, but the survivors are usually 
normal. The radiation during the 
period of major organogenesis results 
in lower prenatal mortality, but it is 
the most sensitive period for the pro- 
duction of morphological abnormali- 
ties. Radiation during the period of 
the fetus produces lesser changes. 
Among the most sensitive systems 
during the prenatal Period is the cen- 
tral nervous system, Russell (61) 
quotes studies dating back to 1907 
which show marked morphological 
changes following X irradiation. In a 
series of studies on rats and mice Hicks 
(29, 30) showed that X irradiation 
during different stages of the gestation 
period affects different parts of the 


Irradiation during the 
first eight days of embryonic life 
effects on t 


periods are only indi 
frequently occurring 
that there is no ex. 
between age of irra 
cific malformations, 
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be pointed out that it is rather diffi- 
cult to determine the precise age of 
embryos. Wilson and co-workers 
(68, 69) have shown that neural dam- 
age is directly related to ; dosage. 
They irradiated rats on the ninth and 
tenth day of gestation with doses 
ranging from 25 to 400 r. On the 
ninth day 25 r. produced ocular mal- 
formation in only a small percentage 
of animals; 50 r. affected 72 per cen 
of the animals, 100 r. produce 
anophthalmia, microphthalmia, or 
other ocular malformations in 90 H 
cent of the animals; 200 r. prove 
fatal to all embryos. Brain damages 
showed similar trends. The data fof 
the animals irradiated on the ea 
day of gestation were similar, A 
that the doses required were higher- 
Fifty r. had little effect, but 100 ai 
resulted in anomalous eye develop 
ment in 75 per cent of the cases. h 
In this connection a study by Rug 
et al. (59) is of some interest. R 
fetuses 13.5 days old were exposed Is 
300 r. of X irradiation. In cane 
examined four hours after exposur 
the retinae revealed massive damage 
On the other hand, animals ie 
six to seven days after birth had i 
signs of injury. Apparently a a 
covery process took place not by iy 
Pair of the damaged cells, but g 
proliferation of the more raion 
sistant precursor neuroectoderm ce E 
here are a number of clinical re 
Ports of various abnormalities pent 
as microcephaly, hydrocephaly, aye 
tal deficiency, ocular malformations, 
lindness and other types of aen 
malformations which are ascribed 5 
fetal X irradiation (25, 49). Micro- 
cephaly is the most frequently Te 
Ported abnormality—17 out of 
abnormal cases in one study (49). Ir 
Some clinical studies, however, nO 
damage is reported following pelvic 
irradiation during pregnancy (61, P- 
909). It is possible that the exposure 
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in the latter cases occurred after the 
critical period. In the study of 30 
Dregnant women who showed one or 
more major signs of radiation follow- 
ing the Nagasaki atomic bomb blast, 
four out of sixteen surviving children 
showed signs of mental retardation 
(70). The report does not specify the 
nature or extent of the deficit. 

So far only two studies have been 
reported which measured specific be- 
havioral consequences of prenatal 
irradiation. Levinson (42) fetally ir- 
radiated rats with 300 to 600 r. X 
rays on the 11th, 13th, 15th, 17th, 
and 19th postconception days. When 
the animals were 50 days old they 
Were tested on a Lashley Type III 
maze. Learning measured in terms 
of number of trials necessary to reach 
a criterion, number of errors, and 
time spent in the maze was impaired 
With the deficits directly related to 
the radiation dose. Radiation on the 
13th day produced the greatest 
changes, This agrees roughly with 
Hicks’ timetable for cortical damage 
(29, 30). Variability was larger in the 
experimental groups than in con- 
trols. Tait et al. (65) X-irradiated 
Tats during the final week of preg- 
Nancy using 30, 90, 180, and 360 r. 
{he offspring of the animals receiv- 
ing 90 or more r. were significantly 
Poorer maze learners than control 
animals. 

P Summary. While there is a great 
eal of evidence for the relative radi- 
Ni of the fetal nervous sys- 
et our behavioral data are rather 

ant. We do not know what kinds of 
activities aside from maze learning 
are affected nor the lower thresholds 
a radiation-induced changes. The 
atter may be of practical signifi- 


cance. 
THE ApuLT Nervous SYSTEM 


E has been known for a long time 
at the adult nervous system is rela- 
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tively radioresistant. Doses in the 
median total-body lethal range pro- 
duce no observable neural changes. 
However, with larger doses, in the 
case of mammals generally over 1,000 
r., a number of investigators have ob- 
tained definite signs of neural degen- 
eration in a variety of organisms— 
man, monkey, dog, rat, rabbit, fish, 
(1, 2, 6, 9, 10, 12, 27, 31, 45, 46, 48, 
59, 61). In general the amount of de- 
generation observed is directly re- 
lated to the dose and conversely an 
indirect relationship holds for the 
latency (2, 9, 10, 12, 31, 59). With 
relatively low doses, a few thousand 
r., the latency may be a matter of 
months, a year, or longer (2, 10, 27, 
45). Many investigators assume that 
the initially observed neuronal dam- 
age is a secondary effect resulting 
from damage to the vascular system 
in the brain (6, 9, 27, 58, 60). Some 
recent studies, however, deny the 
necessity of this assumption (2). It 
might be mentioned here also that 
because of certain methodological 
advantages the use of radioactive 
cobalt has been proposed for the pro- 
duction of circumscribed brain le- 
sions (62). 

Aside from histological studies, we 
have information on functional 
changes. Reflex excitability decreases 
as a function of dose, with high doses 
abolishing the reflex completely (19, 
20). Frequently enhancement pre- 
cedes the depression (2, 23, 39). But 
again it should be emphasized that 
median total-body lethal doses pro- 
duce no easily measurable changes 
(13). 

In a study in which the heads of 
rabbits were irradiated using 12,500 
r. (23) after a latent period of 30 
minutes a convulsive phase with 
grand mal seizures appeared. This 
was followed by a somnolent phase 
of two hours’ duration in which the 
animals were quite inactive. Finally, 
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in the last stages before death, ataxia 
was the most pronounced symptom. 
Changes in equilibrium and disorien- 
tation in space have been reported by 
a large number of investigators (1, 
10, 46, 52, 58, 60). This is in accord 
with several histological reports that 
the brain stem and cerebellum are 
the most frequent sites of radiation 
necrosis. Hemi- or quadriplegia is a 
common symptom after large doses 
(2, 10, 46, 48, 58). 

EEG changes have been recorded 
by several investigators (2, 9, 13, 39, 
58), but again the lower threshold is 
above the median total-body lethal 
dose. The typical pattern is similar 
to that seen in seizures, i.e., periodic 
spikings, high amplitude slow waves. 

The most sensitive Parts of the 
brain are the hypothalamus, glial 
cells, brain stem including the me- 
dulla and the cerebellum (2, 6, 9, 10, 
12, 31). The cortex is more radio- 
resistant than these structures, and 
this is of course significant in be- 
havioral work. 

The Peripheral nervous system is 
even less sensitive than the c.n.s. to 
radiations (32). Doses below 10,000 
r. are ineffective, It takes 45,000- 
75,000 r. to abolish nerve conduction 
in peripheral fibers (22). 
nomic n.s. responds wij 


Skeletal muscles are also relatively 
radioresistant. ith doses below 
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BEHAVIORAL CHANGES 


Almost since the discovery of bs 
rays investigators have reported vari: 
ous changes in organisms following 
radiation. Lyman, et al. (46) in uen 
exhaustive 1933 review of pe 
changes refer to a study by Tark- 
hanov who in 1896 observed quieter 
behavior in flies following X i 
tion. There is also an abundance 0 
individual clinical case studies T 
which radiation was applied for ye 
apeutic purposes. This review bk: 
emphasize primarily those studies, 
however, which were designed Poa 
cifically to investigate behavioral ei 
fects. The latter includes mon 
phenomena customarily included 1 
the field of psychology. 


Learning and Performance 


The first attempts to assess ate 
effects of radiations on learning ie, 
Performed in Pavlov's laborat oai 
Nemenow (51, 52) irradiated oor. 
head of one dog with a dose of 1,500} 


n 
There was only a slight aroha ii 
his salivary CR’s. After an 4 1'5 
tional 2,200 r., however, the 


Practically disappeared for a ae 
of five weeks. A second dog EEA e 
3,500 r. then again 2,800 r. an to 
results were essentially similar ” 
those seen in the first animal. a 
cinnti ah (46) X-irradiated the oc 
cipital part of the head of four i 
with massive doses of 17,000-18, 7 
r. after their CR’s had been a a 
ized. All of the animals showe ry 
temporary decrease in their salivar 
R's, but the onset and duration e 
this decrement varied. Two of ae 
animals (“excited types’’) oath 
showed a rise in CR'’s preceding T 
drop. The strength of the respons i 
also varied as a function of the gp 
of CS. One of the animals kept ee 
for six months after the treatmen 
exhibited a second lowering of an 
following the recovery from the firs 


f 


4 
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decrease. This latency in the gross 
pathological manifestations is con- 
sistent with the other investigations 
discussed in the previous section. 
The change in the CR’s occurred dur- 
ing a period when the S exhibited 
ataxia, impaired vision, circus move- 
ments, and general deterioration of 
behavior. It was difficult to test the 
dog. Interpretation of the data from 
the whole study is obscured by the 
observation that in three Ss not only 
the CR’s but the UR’s also showed a 
drop. 

In a study for which an abstract 
only is available, Harlow (28) re- 
Ports that radon tubes inserted into 
the cortex of ten rhesus monkeys 
produced progressive loss on delayed 
reaction, patterned string tests, and 
simple position habits. No data are 
given for the dosage used. It was 
Probably quite large in view of other 
Negative findings reviewed below. 

No further work was done in this 
field until after World War II. 
Furchtgott (16) tested rats exposed 
to 200-500 r. of total X radiation ina 
four-unit water maze. Neither acqui- 
sition nor retention using several cri- 
terion measures was affected by the 
treatment. Arnold (3) exposed the 
heads only of rats to 300-800 r. and 
tested them for retention of a 14- 
unit T-maze habit and other irradi- 
ated Ss were tested for the learning 
of the habit. No statistically signif- 
icant changes were found. Fields 
(14) studied performance on elevated 
Tunways, 32- and 40-choice-point 
elevated T-mazes, and a 10-choice-5- 
Stage vertical maze of some 500 male 
rats which had received doses rang- 
ing from 100-1,000 r. On the whole 
radiation had little effect on the per- 
formance of the animals except for a 
decrease in the speed and amount of 
activity immediately following irra- 
diation which was probably due to 
the general radiation malaise. Davis 
(11) tested rhesus monkeys in the 
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Wisconsin General Test apparatus 
following X irradiation but was unable 
to find any impairment of perform- 
ance on discrimination-type tasks. 
In a series of studies sponsored by 
the U.S. Air Force School of Aviation 
Medicine (34, 57), monkeys were 
tested on acquisition, retention, and 
transfer of multiple discrimination 
problems immediately and 150 days 
after exposure to sublethal and lethal 
doses of X rays. Again the reported 
results failed to demonstrate any de- 
leterious effects. The only deficit 
that was noted was an increase in re- 
action time. 

Garcia et al. (18) established a con- 
ditioned aversion to a saccharine 
solution which was associated with 
exposure to gamma irradiation. Ex- 
perimental animals had saccharine 
solutions in their cages while being 
exposed for six hours in the gamma 
field, while control Ss had tap water. 
Preference was then tested for 63 
postirradiation days. The control 
group showed no loss of their natural 
preference for saccharine, while ex- 
perimental Ss exposed to only 30 r. 
showed a significant drop in their 
saccharine intake. The authors hy- 
pothesize a general behavior dis- 
turbance during radiation which be- 
came associated with the taste stim- 
muli. It should be pointed out that 
the animals were being exposed at a 
very slow rate and some of the general 
radiation malaise might have been 
effective for a sufficient length of 
time for the conditioning. The effec- 
tiveness of the low dose used is sur- 
prising, however. 

Jones et al. (33) measured the ef- 
fects of 200-1,000 r. of whole-body | 
X irradiation on activity-wheel per- 
formance, using 194 rats. Data were 
analyzed separately for animals who 
survived the eight-week experimental 
period and those that succumbed to 
radiation injury. Rats which died 
during the first nine postirradiation 
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days showed a gradual decrease in 
activity until death. Those that sur- 
vived nine days, but died subse- 
quently showed a decrease immedi- 
ately after irradiation followed by a 
recovery and a second depression of 
activity prior to death. All of the 
surviving animals (doses 200-680 
r.—all animals with higher doses 
died) showed a decrease in activity 
postirradiation, The 200-300 r. 
groups recovered completely by the 
fifth day. The higher-dose groups 
also showed a partial recovery dur- 
ing the first postirradiation week, but 
they exhibited a second depression 
during the third week. The 400-450 
T. groups attained normal levels of 
activity four weeks after irradiation, 
the 681 r. groups after eight weeks, 
In general there was a direct rela- 
tionship between degree and dura- 
tion of activity depression, 

In another study the same group 
of investigators (36) tested the ef- 
fects of 300-1,000 r. X irradiation on 
exhaustive swimming exercise. The 


nto a 24-gallon 


k re forced to swim 
until they were exhausted 


animals. The 500 r. Sroup, however 
showed a significant drop and the 
higher r. animals in turn differed sig- 
nificantly from the 500 r. group. 
Furchtgott (15) subjected adoles- 
cent rats to 300 and 500 r. of X rays 
and tested their swimming speed in 
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a 12 ft. straight-away tank for 13 
days. The 300 r. group did not differ 
from the controls, but the survivors 
in the 500 r. group were significantly 
slower. g 

Vogel (67) daily irradiated with 50 
r. X rays six aggressive mice each of 
whom, prior to the treatment, always 
defeated submissive animals. Even 
after irradiation the aggressive ani- 
mals continued to be dominant until 
shortly before death. 

McDowell (47) observed a reduc- 
tion in “other-animal involved” be- 
havior and visual attention to the 
activity of other animals following 
400 r. of X irradiation in 10 rhesus 
monkeys. The animals also showed 
fewer instances of aggression and ‘ 
greater incidence of lethargy. All o 
these symptoms are easily under 
standable considering the genera 
malaise which is associated with ra- 
diation. 

Leary and Ruch (38) exposed 18 
rhesus monkeys to 200-400 r. of total- 
body X irradiation. Cage-crossingS 
were not affected. On the first post- 
irradiation day only for the 400 T. 
animals (the others were not ah- 
served) scratching, grooming, and oth- 
er signs of activity were depressed—4 
sign of general malaise, Mechanica 
puzzle manipulation did not produce 
Statistically significant differences be- 
tween pre- and postirradiation per! 
ods. Pedometer manipulation was 
impaired in the 400 r, animals (others 
were not tested) and surprisingly 
weight-pulling, supposedly a measure 
of general strength, did not decrease 
in all animals, 

_In general it may be said that ra- 
diation produces a certain amount of 
depression in activity which should 
De most apparent when motivation 
is low or when the task requires a 
Sreat deal of effort as in the exhaus- 
tive swimming experiment (36). The 
latter effect would tend to parallel 
Gerstner’s, et al. (22), findings on the 
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effect of X irradiation on muscular 
contraction. 

There is a puzzling report of a 
clinical study of 120 patients who 
had received several single doses of 
30-50 r. during a 7-10 day period 
totaling 150-250 r. of diencephalic X 
irradiation (4). Immediately follow- 
ing the treatments the typical symp- 
toms were numbness, apathy, and 
tingling sensations in the head re- 
gion. The day after the irradiation, 
however, most patients reported spon- 
taneously that they felt euphoric, ac- 
live, and generally tranquil. This 
state lasted from a week to several 
months. Most of the treated pa- 
tients were neuropsychiatric cases 
with diagnoses of urticaria, migraine, 
depression, etc. However, in addi- 
tion, two medical collaborators sub- 
Jected themselves to 100 r. adminis- 
tered to the diencephalon and they 
also experienced the same changes as 
the patients. Sixty-one of the pa- 
tients reported changes in their sleep 
Patterns. The sleep on the night fol- 
lowing the treatment was usually 
characterized as “extremely deep,” 

heavy,” or “leaden.” In addition 
37.5 per cent of the Ss reported sexual 
changes, notably improvement in 
libido, potency, and the menses. 

The authors ascribe these changes 
to hypothalamic stimulation pri- 
marily of the anterior, parasympa- 
thetic nuclei, a finding in accord with 
the frequently reported radiation- 
induced vagotonia (66). These re- 
sults, if confirmed by other investiga- 
tors, should have therapeutic impli- 
cations. They also raise many ques- 
tions of interest to the experimental- 
Ist working with animals since we 
have practically no data on emo- 
tional behavior following radiation. 

Summary. The lack of any dra- 
Matic changes in learning functions 
following sublethal or just lethal 
total-body X irradiations reported 

Y several experimenters agrees very 


well with similar neurophysiological 
observations on the resistance of the 
nervous system in that dose range. It 
takes doses which are well above the 
median total-body lethal range to 
produce any neural changes and then 
there is usually a considerable la- 
tency. In the one study in which 
there was an 18-month time lapse 
between the treatment and testing, 
no drastic decrements took place (14). 
Whether a longer period would have 
any effects isan open question. While 
acquisition, retention, or transfer are 
not affected, performance indices 
which utilize gross muscular activity 
are impaired to some extent and this 
impairment persists for a number of 
months (in the study of swimming 
endurance [36] up to nine months for 
the most heavily irradiated group). 
Another factor to be considered is 
what might be called, for the lack of a 
better name, general malaise, which 
includes a lack of motivation to re- 
spond to stimuli or initiate activity 
which is present immediately follow- 
ing radiation and appears again dur- 
ing the second week in more heavily 
irradiated animals. This is accompa- 
nied also by a loss in appetite and 
drop in body weight. 


Sensory Functions 


Hearing. In the clinical literature 
there are reports of improved hearing 
following X irradiation. In the early 
thirties Girden (24) working in Cul- 
ler’s laboratory attempted to investi- 
gate this problem using dogs in the 
classical conditioning setup. Standard 
psychophysical procedures were em- 
ployed to obtain absolute m 
thresholds. Subsequently the heads 
of twelve animals were irradiated. 
The study was exploratory in nature 
and there was no systematic design 
to test radiation factors. Eight ani- 
mals were irradiated using 80-100 
kv. peak voltage and 5 ma., while 
four animals got roentgen rays gen- 
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erated at 200 kv. peak and 5 ma. One 
animal received 5 r. every day for 
five months, one 5 r. for four days, 
one anywhere from 100-1,100 r. on 
seven days, spaced one to seven days 
apart, and so forth. The total dosage 
varied from 20-11,100 r. The ani- 
mals which were irradiated with the 
80-100 kv. rays all showed a tran- 
sient gain in acuity which averaged 
5.5 decibels after a latent period of 
seven to eleven days. Dosage was ap- 
parently notinvolved since thechanges 
appeared even after the surprisingly 
low value of 20 r. None of the Ss ir- 
radiated at 200 kv. showed any im- 
provement in acuity. 

In a second study Brogden and 
Culler (5) examined more critically 
the effect of dose and also the fre- 
quency variable. 
irradiated at nine differen 


se and it 
ays. The 


m of th 5 
phenomenon, two dogs were hes 


pophysectomizedand irradiated ;audi- 


sured before 
one dog, 

all of these cases hypoglycemis wen 
associated with lower auditory thresh- 
olds. The authors hypothesize that 
low sugar levels lessen density 


1 and 
viscosity of cochlear fluids, and 
thereby decrease resistance to incom- 


ing vibrations, and Perhaps also the 
ionic conditions in the cochlea affect 
the magnitude of the cochlear poten- 
tials. 

Vision. Fields (14) found no ef- 


fects on brightness or acuity discrim- 
ination in rats following X irradia- 
tion. Russian workers (35) have re- 
ported that dermal X irradiation in- 
creases the threshold to dark adapta- 
tion and that this effect persists for 
several days. Lenoir (41) tested dark 
adaptation in 11 patients following 
therapeutic irradiation. In all cases 
there was a decrease in dark adapta- 
tion which was independent of the 
dose (2,400-6,240 r.). The changes 
could be detected for 20 to 36 days. 
The author ascribes this reduction in 
dark vision to a drop in vitamin A 
concentration which follows the X 
irradiation. Furchtgott (17) tested 
brightness discrimination in a Lash- 
ley jumping box under conditions of 
low illumination following 369-469 r. 
of X irradiation. The performance of 
the irradiated rats was slightly infe- 
rior to that of control animals. It 
should be noted here also that Cibis 
et al. (8) found that rod cells are con- 
siderably more radiosensitive than 
cones. Destruction of rods require 
1,700-2,000 r, while the threshold 
for cones is 10,000-30,000 r. J 

The work on cataract formation 

as been reviewed adequately (40) 

and it is omitted here since the stud- 
ies Involve primarily morphological 
changes, 

Other senses. The work on other 
Senses is scant. Lindemann (44) ob- 
Served fifteen patients who received 
therapeutic X-ray treatments for 
tumors in the oral cavity. Taste 
Sensitivity and in some cases odor 
Sensitivity were depressed for severa 
months. In an unpublished study 
Furchtgott found some indication of 
lowered thresholds to electric shock 
in rats following sublethal doses of 
whole-body X irradiation. A 

Summary. While there is some evi- 
dence for changes in sensory func- 
tions notably hearing and scotopic 
Vision after irradiation, the available 
data are quite limited. Much more 
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research is necessary in the various 
p peory modalities to determine the 
pe if any, which affect percep- 
ion. 


SUMMARY AND CONCLUSIONS 


The published studies pertaining 
to the behavioral effects of high-en- 
ergy radiations were reviewed. More 
studies have actually been performed 
in this area. The author knows of 
several additional ones, performed 
by himself and by others, but the 
negative results have discouraged 
the workers from publishing them. 
pe nderlying any discussion of the 
ae el effects of radiation is the 
ae ive radioresistance of the adult 
in nave system. Total-body doses 
E he median lethal range do not 
a Gi produce any gross neural dys- 
in p a Except for the instances 
ae hich the body is shielded and the 
LS anons are applied to the head 
as Y, death will intervene long before 

R neural changes can be observed. 
oe will not find any significant 
hielo changes in those activities 
nei, are mediated directly by the 
Sten Ke system. We have reviewed 
ene Ay studies of learning by differ- 
this Nvestigators which seem to bear 
es peu. Actually it is possible that 
ae ce will show a decre- 

ihe in learning following radiation. 
NE this would be primarily a 
acon! Jon of the change in the non- 
vac ative learning factors, i.e., moti- 

ion and perception of the stimuli. 
eae have pointed out that radiation 
Ea uces changes in the blood and 
EE fluids, gastrointestinal tract, 
some of the endocrine secretions. 
troll the homeostatic energy-con- 
ing mechanisms are affected and 


we should find, therefore, changes in 
motivation and performance. We 
have indeed seen that some of these 
functions are altered. Food and 
water intake, exhaustive swimming 
exercise, activity wheel and pe- 
dometer performance, and social be- 
havior changes have been reported. 
There are still a number of problem 
areas here such as emotionality, 
motivation, other than hunger and 
thirst, which have not been investi- 
gated. Here we should mention again 
that radiation seems to lead to the 
pituitary-adrenocortical stress reac- 
tion and that the hypothalamus and 
the autonomic n.s. are relatively 
more sensitive than the cortex. It 
would seem also that performance 
which requires a large expenditure of 
energy or where extrinsic incentives 
are very small will be affected the 
most by radiations. 

In the sensory field some experi- 
mental work has been reported on 
hearing and vision and we have also 
clinical data on these and other 
modalities. On the whole, however, 
there are large gaps here. 

The great sensitivity of the de- 
veloping nervous system was briefly 
discussed. The quantity of behavi- 
oral data does not approach our 
knowledge of morphological changes. 
We have only two studies on maze 
learning in rats. It would seem that 
this area should be explored in greater 
detail and functions other than maze 
learning could be explored. 

The genetic aspects of radiation 
were not considered since we have no 
data here on variables which are con- 
ventionally classified as psychologi- 


cal. 
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COMMENTS ON MEEHL AND ROSEN’S PAPER! 


SAMUEL KARSON anp SAUL B. SELLS 
USAF School of Aviation Medicine, Randolph Field, Texas 


The recent paper by Meehl and 
Rosen (3) presents a rationale for 
evaluating the predictive efficiency 
of psychometric instruments which 
should be of interest and importance 
in clinical and personnel research. 
The purpose of this comment is to 
emphasize the principle of the de- 
pendence of statistical criteria on 
administrative policy in selecting 
the appropriate criterion among the 
cases which they have effectively pre- 
sented. 

The basic reference statistic for 
evaluation for predictive efficiency is 
the base rate, according to Meehl and 
Rosen (3, p. 194). Evaluation of any 
Predictor requires comparison of re- 
sults based on prediction with the 
base rates prevailing in the situation. 
Thus, if one thousand candidates 
were available for military service 
and the base rate of noneffectiveness 
were 5%, 950 successful candidates 
might be expected without screening. 
Now, if a screening device operated 
to admit less than 950 successful can- 
didates in the same situation, Meehl 
and Rosen would consider such a test 
less efficient than the base rate. 

Their analysis considers three sepa- 
ae cases. The first is efficiency in 

electing cases of poor adjustment. 
Here they classify as errors of predic- 
tion only the false-positives rejected. 
When the false-positive rate is higher 
than the base rate of noneffectiveness, 
they would conclude that use of the 
Screening test would be less efficient 
than no screening at all. The second 


ete writers wish to express their appre- 
E, to Dr. Samuel Fulkerson for con- 
‘ibuting to the discussion which culminated 
in the present paper. 


case is efficiency in prediction for all 
cases. Here they classify as errors of 
prediction both the false-positives 
rejected and the false-negatives ac- 
cepted. When the number of success- 
ful cases attained through a sample 
of available individuals is lower as a 
result of screening than could be ex- 
pected according to the prevailing 
base rate, they would consider such 
screening inefficient. The third case 
is called efficiency in detecting cases of 
good adjustment. Here only false- 
negatives are regarded as errors. 
Thus to the extent that the propor- 
tion of successfuls in the sample ac- 
cepted is greater than expected ac- 
cording to the base rate, they would 
consider screening to be efficient. 
They point out, however, that such 
efficiency is relative, inasmuch as it 
purchases increased efficiency of per- 
sonnel accepted at the cost of reject- 
ing some potentially successful candi- 
dates in the screening process. 
Although the point is implied by 
Meehl and Rosen, it seems important 
to emphasize as a general principle 
that the choice of the appropriate 
test of efficiency depends on the poli- 
cies in effect and the purposes of 
screening required to fulfil them. 
Widespread misunderstanding of this 
principle could seriously impair the 
status of many useful screening an 
prediction programs. All too often 
scientists are too preoccupied with 
considerations of validity, while they 
fail to recognize the practical prob- 
lems facing administrators who uti- 
lize psychometric techniques. On the 
other hand, administrators need to 
understand this principle so that they 
may avoid the error of rejecting use- 
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ful methods as well as that of accept- 
ing inefficient ones, through faulty 
evaluation. 

With specific reference to induc- 
tion screening of military personnel, 
the manpower administrator is con- 
cerned with supply and demand is- 
sues on one hand, and with the bur- 
den of additional administration and 
loss of productive work due to non- 
effectiveness on the other. In times 
of manpower scarcity, he may be 
pressed to utilize every available 
man. Under such circumstances, he 
would seek to admit the maximum 
number from the pool available. 
Then the Meehl-Rosen Case 2 would 
be properly applied in evaluating 
prospective screening devices. 

If, however, manpower shortages 
were less pressing, or if the waste 
attributable to noneffectiveness were 
considered too great, the adminis- 
trator might be agreeable to the re- 
jection of some potentially successful 
individuals by a Screening device 
which could assure a greater propor- 
tion of successful candidates from the 
number admitted than might be ex- 
pected according to the base rate, 
The gross number of successful candi- 
dates for any available sample would 
be less, depending upon the rejection 
rate for the particular screening de- 
vice, but the noneffectiveness rate 
might be reduced. In these circum- 
stances Case 3 would be appropriate 
to evaluate the increase in proportion 
of successful candidates as a result of 
screening and Case 1 could be used to 
evaluate the cost in terms of false- 
positive rate. 

The criterion implied in Case 2 
requires maximization of the number 
of successfuls in relation to the total 
pool available, whereas Case 3 re- 
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quires maximization of the number of 
successfuls in relation to the number 
admitted. Both need to be evaluated 
against the base rate. The former 
criterion may be dictated in circum- 
stances of manpower scarcity, while 
the other would reflect a policy de- 
cision more sensitive to the cost of 
accepting and caring for noneffective 
individuals in hospitals, guardhouses, 
and nonproductive jobs. Policy, not 
mathematical reasoning, must dictate 
the appropriate criterion of evalua- 
tion and the proportion of incorrect 
predictions which can be accepted. 
The writers feel that in view of the 
general excellence of Meehl and Ros- 
en’s paper, their oversight in connec- 
tion with their discussion of Case 
should be mentioned. They demon- 
strate (3, p. 195) that the use of the 
Danielson and Clark (2) screening 
inventory would result in a decreas? 
in the total percentage of correc 
predictions made (from 95% ka 
79.7%) when comparing the test wit 
the base rates. They do not, how- 
ever, indicate that the screening 17° 
ventory has actually succeeded E 
raising the percentage of correctly 
predicted “fails” from 5% (base ae 
to 13%. Later they do recognize thi 
kind of gain when they demonstrate 
(3, p. 204) that a certain cutting 
Score on the Glueck prediction index 
Succeeds in correctly identifying 4° 
linquents with an accuracy of 92.6% 
as compared with an expected 2070 
base rate, even though predictions ar® 


made for only 2.4% of the popula- 
tion. 


* r rrent 
i 2 It is of interest to note that the G e 
induction screening policy of the armed sé! 


ices emphasizes the second criterion de 
scribed (1, 4, 5). 
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There have been several recent in- 
stances in the psychological literature 
(1, 2, 6, 8, 9) of the use of the statistic 
known as Kendall’s tau (T), a non- 
parametric correlation coefficient. It 
is to be hoped that its use reflects a 
growing realization among psycholo- 
gists of the inadequacy of the Pear- 
son product-moment coefficient (r) 
in a number of circumstances, Some 
of these circumstances are: 

1. When the variates to be cor- 
related show sharp departures from 
normality. Although the distribution 
of sample 7’s from nonnormal but un- 
correlated populations differs only 
slightly from the normal case (4), it 
may differ considerably when the 
true 7 is not zero, kurtosis rather than 
skewness being the more important 
factor (3). 

2. When the variates to be cor- 
related are unmeasureable according 
to an objective scale, as in the case of 
ratings or preferences of judges, or 
when precise measurement is imprac- 
tical and the raw data must be sets 
of ranks. Under these circumstances, 
the evaluation and interpretation of 
r often requires assumptions which 
it would be imprudent to make, 

3. When there is reason to believe 
that the regression of one Vari 


? ate on 
the other is nonlinear, r will ti 


end to 


+The preparation of this Paper was sup- 
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work was done while both authors were at 
the Iowa Child Welfare Research Station. 
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underestimate the degree of interde- 
endence. 

i The use of a rank-correlation coef- 
ficient requires no assumptions, “ae 
garding the form of the distributo a 
of the variates and is thus anne 
suited to the resolution of the di 
culties posed by the first two el 
stances. A rank coefficient also Ya 
not underestimate a relationship RET 
when regression is nonlinear so long a 
the regression function is mono Hi of 
which is usually the case in psych 
logical research. 

These considerations apply both be 
T and to the better-known rank ca 
relation, Spearman’s rho. This pana 
however, will be concerned Ci EEr 
with the former since it has a num ly 
of advantages over rho, and is r ae 
discussed in current statistical tex ea 
The most important of these pert 
tages is that the significance ofa oe E 
ple tau (7) can beevaluated with SE 
tainty in terms of the normal pro f 
bility integral for all but very a 
values of n. Furthermore, confident 
limits for T can be determined fro J 
sample 7's. If the rank-order coe zh 
cient is regarded as merely a rone 
approximation of 7, these considcri 
tions are not particularly mo : 

hen it is used as a test of an ue 
Pothesis for which it alone is are 
Priate, as is often the case, then i 
advantages, especially the former 

ecome significant. ae 
Tau can also be used for the cor fe 
putation of both partial and ee 
correlation coefficients. | Howevel: 
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neither of these measures is very use- 
ful at present as we shall indicate in 


» a subsequent section. 


DEFINITION AND INTERPRETATION 
Tau is defined as 


2 5 2S: 
n(n—1)/2— n(n—1) f 


[1] 


when x is the number of items ranked 
and S= (P—Q), where P is the num- 
ber of item pairs on the order of which 
both rankings agree, and Q is the 
number on which they disagree. Tau 
ya vary from +1.00 when all possi- 

e pairings are ranking concordantly, 
to — 1.00 when all pairings are ranked 
discordantly. 

Consider the following rankings 
of the ‘ambiguity” of eight sentences 
made by two judges, with the rank- 
ings of Judge A arranged in the nat- 
ural order: 

5 


Judge A 2 


1 3 4 67 8 
Judge B 516 7? k 


5 
2 8 4 
nae first sentence, i.e., the one ranked 
By; Judge A, has to its right in Judge 
s rankings 3 larger ranks and 4 
smaller ranks. We allot +1 for each 
of the larger ranks, and —1 for each 
of the smaller ranks. The second 
ean Judge A’s ranking has to 
pa cot in Judge B’s ranking 6 larger 
i nks and no smaller ranks. And so 
n through sentence 7.  Alloting 
Pluses and minuses in this fashion, 
we obtain: +3, —4; +6, —0; +2, 
—3; +1, =3; +3, —0; +2, —0; 
and +0, —1. P, the sum of the 
Pluses is 17, and Q, the sum of the 
Minuses is 11. S, which is P—Q, is 
thus 6. With »=8, we obtain accord- 
Ing to [1] a 7 of 6/28=.21. 
he interpretation of 7 follows 
Teadily from [1] since n(n —1)/2_ is 
the total number of item pairs with 
respect to which the rankings can be 
compared. A given value of T asserts 


that the statement “The order in 
which two items are ranked accord- 
ing to one variate (or judge) will be 
the order in which they are ranked by 
another variate (or judge)” will be 
correct (100+100T)/2 per cent of the 
time, on the average. 

When there are tied ranks, certain 
adjustments in the computational 
formula for 7 must be made since the 
total number of possible item pairs 
will vary as a function of the ties. If 
there are ties in only one of the rank- 
ings, we arrange the untied ranking 
in the natural order and proceed to 
compute S as before except that the 
numbers in the second ranking, to the 
right of the items under considera- 
tion, which are the same as the rank 
of this item contribute nothing to the 
value of S. When both rankings con- 
tain ties, we arrange either one in the 
natural order and compute conven- 
tionally except that item pairs which 
are tied in the upper ranking also con- 
tribute nothing to the value of Su 

The major adjustment for tied 
ranks occurs in the denominator of 
[1] as might be expected. The general 
formula for 7 from tied ranks contain- 
ing the adjusted denominator is: 


ud By 
7 V(n(n—1)/2-V] 


v [n(n—1)/2— U] 2] 


2 Smith's (12) description of the method for 
calculating 7 when ties are present is in error 
since it neglects the effects of ties in the upper 
ranking on S. This oversight leads to mark- 
edly unreasonable 7's and distorts the sampling 
distribution by producing too many large 
absolute values of 7. For instance, in one of 
Smith’s examples (12, p- 570), he obtains a 
corrected 7 of +1.00 despite the fact that one 
judge perceived differences between items 
which were rated identically by the other. 
The correct procedure leads toa r of .868, 
which expresses the high, though not perfect, 
degree of agreement which is present. 


and 


U=} J u(u-1). 
The computation of V and U will 
be illustrated in the following ex- 
ample. 
z the two sets of ranks below, the 
upper ranking has been arranged in 
the natural order. 


1 25 25 4 6 6 6 8 
2 6 f 62625 


The first item, i.e., the one ranked 1 
in the upper ranking, has five larger 
ranks to its right in the lower rank- 
ing, and none smaller. It is not tied 
with any other item in the upper 
ranking, so its contribution to S is 
+5. The second item has 2 smaller 
ranks to its right, and thus contrib- 
utes —2. Although this item is tied 
in the upper ranking, the pairs which 
are tied are not invol 
tribution. 


these, the si 


upper ranking, and thus 


The net con- 
em is therefore 


+2. A similar 
procedure for the sixth item | 


eaves it 
with a net contribution of 0, The 
seventh item contributes +1. 5 


, 


the net total, is 7-6 =1, 

V and U are obtained in the fol- 
lowing manner: the upper ranking, 
from which V is computed, containg 
two sets of ties, one of extent 2 and 
one of extent 3. For the first 


Set, 
v=2, and v(v—1) =2(2—1) =2, Bor 
the second, v=3, and v(v—1) = 
3(3—1)=6. The sum of the expres- 
sions v(v— 1) in the upper ranking js 
(2+6)=8, and V=3(8) =4, The 
lower ranking also contains two sets 
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of ties, of extents 3 and 5. These zar 
enter into the computation of U. = 
these ties, v=3 and 5 and u(u—1)=6 - 
and 20. Hence U=4(6+20) gen es 

Substituting the computed va se 
of S, V and U in [2], we obtain 

53 
oe 7 is computed without a 
justing the denominator for OE 
will always be numerically less ae 
when the adjustments are made. ok 
use of the uncorrected oer ig 
is recommended by Kendall a 
when agreement with an caer 
ranking is being determined. In = al 
a case, only the judge’s ranking W at 
contain ties, and these would oe 
retically indicate inability to fail- 
criminate the objective order, a ap 
ing for which the judge should Pe 
erly be penalized. In general, id be 
ever, the corrected formula shou i 
used since rank correlations ae 
usually computed when agreem 
rather than accuracy is the issue. or 

The procedure for adjusting ii- 
tied ranks can be generalized to l 
clude cases involving dichotomit= g 
dichotomy may be regarded as t thé 
of tied ranks of the extents © Oe 
number in each of the two ee 
and the computational proce a 
need not differ from a a 
which ties are less extensive. by 
ever, some labor can be avoide' (ae 
the use of the following formu’ 


S —, B 
V [n(n= 1)/2—V]V pa 


j : ichot- 
when one of the variates is a dic 


T= 


P re 
“If the computation of + when HPne 
present seems tedious, it should be po ect- 
out that rho has no advantage in this ogee 
he proper computation of rho from 
tanks also involves corrections in being 
numerator and denominator, the latter d for 
similar in form and effort to that require tion, 
Te Unfortunately, most texts fail to men rhos 
let alone describe, the corrections op that 
thereby creating the inaccurate belie 
none are necessary. 
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omy consisting of p and (n—p)=g 
members in the categories, or 
a [4] 
Vv (bq) (xy) 
when both variates are dichotomized 
into categories consisting of p and g 
members, and x and (7—x) =y mem- 
bers. In this case, if we arrange the 
frequencies in a 2X2 table as for cor- 
related proportions, S will be found 
to equal the difference between the 
products of the frequencies in the 
diagonal cells. 


TESTS OF SIGNIFICANCE! 


The distribution of sample 7’s for 
uncorrelated variables rapidly ap- 
proaches normality and is satis- 
factorily approximated, when 7>10, 
by the normal distribution with a 
mean of zero and a variance defined 
as 


4n+10 
9n(n—1) 


When ties are present, the formula for 
the variance of r becomes compli- 
cated.’ If the number of ties is small, 


2 


[5] 


Or 


‘In this paper all significance tests ar 
attributable to Kendall (7) unless there is a 
specific indication to the contrary. 

5 The variance of r when there are ties in 
both rankings is 


oP {nln DQn+5) 
ene "O n 


= E v@—1)2e+5)— E u(u—1)(2u+5)} 
4 
mwaya) { zu v(v—1)(e—2)} 
“{ E u(u—1)(u—2)} 
2 
taI { X 0-1] { x u(u—1)}. 


only one ranking contains ties, this reduces 
o 


e2 


2 
7 Oe? {n(m—1)(22+5) 


— Ð u(u—1)(2u+5) }. 
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[5] may be used with only a slight 
error. Since the correction for ties 
will invariably reduce the variance, 
the use of the uncorrected formula 
will furnish a more conservative test 
of the null hypothesis. 

Kendall (7) provides probability 
tables for evaluating the significance 
of an obtained S (rather than its 7) 
when #<10. Values of 7 required for 
significance at the .10, .05, and .01 
levels (or beyond, since r can take 
only a limited number of values) for 
n's from 4 through 10 are shown in 
Table 1. 

When ties are present in one of the 
rankings, Sillitto’s tables (11) of the 
distribution of S for all possible num- 
bers of pair and triplet ties for small 
n's may be used. When other types 
of ties are present, or when both rank- 
ings contain ties, the evaluation of 
7 is not feasible if » is 10 or less. 


CoRRECTION FOR CONTINUITY 


When the significance of r is evalu- 
ated using normal probability tables, 
it must be corrected for continuity, 
since S can not assume all values 
within the range +łn(n—1). Since 
n is fixed, an increase in P is accom- 
panied by a decrease in Q, and the 


When one ranking is a dichotomy consisting 
of x and y members so that (x+y) =n, the 
variance is 
4xy 
2. __ {nn Do (wi—u)j- 
or 38(n— 1) { x 
The variance when one ranking is a dichot- 
omy and the other contains no ties is 
n 4xy(n+1) ( 
C= a 
3n?(n—1)? 
When both rankings are dichotomies with x 
and y, and p and q members respectively, the 
variance becomes 


___4Axypa 
oo n2(n—1)8 


The above formulae are to be found in 
Kendall (7). 


minimum change in S is thus 2. The 
appropriate correction for continuity 
is therefore to subtract i from the 
absolute value of S. This is equiva- 
lent to a deduction of 2/n(%—1) from 
the absolute value of 7, and the correc- 
tion may be applied at either point. 

This simple correction is appropri- 
ate when neither distribution con- 
tains ties, or when only one has ties. 
When one ranking consists entirely 
of ties of extent u, and the other 
ranking is a dichotomy, the correc- 
tion for continuity consists of sub- 
tracting u from S, or 2u/n(n—1) from 
7. If both variates are dichotomies, 
the deduction for continuity from S 
is 4” or 1/(m—1) from 7, 

In instances where both rankings 
contain ties but are not dichotomies, 
there is no simple way of applying a 
correction. Whitfield’s proposed cor- 
rection (13) for the case in which one 
variate is a dichotomy and the other 
contains ties of varying extents might 
be used for the general case of ties 
in both rankings. Whitfield’s method 
involves arranging the undichoto- 
mized ranking in the natural order 
and subtracting the extent of the ties 

involving the smallest and the great- 
est rank from twice the 
items ranked. This quan 
divided by the number 
in the ranking. One- 
tient is the deductior 
correction. 
S, the ded 


number of 
tity is then 
of intervals 
half of this quo- 
1 from S for the 
If 7 is corrected instead of 
uction is the quotient di- 


vided by n(n—1). The formal expres- 
sion for this correction for S is 
2n—-4—% 
(Gye ae Le 
2ni [6] 


where 1 is the number of items ranked, 
vi is the extent of the tie involving the 
smallest rank, vz is the extent of the 
tie involving the largest rank, and 
n; is the number of intervals in the 
ranking. (If a ranking had no ties, 


42 MAURICE S. SCHAEFFER AND EUGENE E. LEVITT 
3 J 


A 5 ee 
nj=(%—1); in a dichotomy, i 
In our illustrative problem (p. 340), 


n=8, vn=1, v=1, and n;=4. Ac-. 


cordingly, the deduction from S 
would be 
(2X8—1-—1) 


= 1,75. 
2X4 
P ' 
The generalization of Whitfield 
procedure to the general case of ti 3 
in both rankings is apparently Ta 5 
simple matter, and it has not yet ae 
accomplished. A suggestion woul z 
to consider the ranking with the ne 
intervals (and the most tied ice 
as a dichotomy, and to apply Wee 
field’s correction. This actually pei 
provide an overcorrection for CO 


e 
tinuity and hence a safer test of th 
null. 


ConFipencr Limits OF T 


It is often desirable to establish 
confidence limits for the parameter 
correlation when a significant sania 
coefficient has been obtained. ia 
any value of a population T, the ot s 
pling distribution of 7 tends rapi H 
toward normality (though not d 
rapidly as in the null case), provi p 
that the absolute value of T is one 
too close to unity. The mean of Hut 
distribution is the population T, a 
the variance cannot be exactly A 
termined unless something is pein 
about the arrangement of ranks is 
the population, information which i 
almost always lacking. Howevety ier 
can be shown that for any paramet: 


A he 
T, the variance of 7 cannot exceed t 
value 


maximum pal AE 5. [7] 
n 


Confidence limits of T can be set y 
substituting the value of the sama A 
7 in [7]. An alternate method is Bs 
solve equation [8] with the roots PT Hi 
viding the limits. The value of * 


5 
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the normal deviate corresponding to 
the desired probability level. 


- [8] 


Since the limits determined by 
means of [7] and [8] are based on a 
maximum variance, the probability 
is at least, but not precisely, (1—P) 
that the true T lies within those 
imits, Unless » is fairly large, the 
Bepiade of the limits will often be 
o great as to render them practically 
useless. Kendall (7) has developed 
an additional method which involves 
pn timation of a parameter repre- 
ie ing the arrangement of ranks in 
ee as from the obtained 
ae 1 hile this method frequently 
it s in tremendous reductions 1n 

e extent of the confidence limits, it 


is H s 
too complicated and laborious for 
ordinary use. 


SIGNIFICANCE OF A DIFFERENCE 
BETWEEN 1's 


eae aluating the significance of a dif- 
i between two independent 1's 
at, oe no special problems since 
a ii erences will be approximately 
ae ally distributed around a mean 
The TO in a test of the null hypothesis. 
a eal ratio which is conven- 
appli a used in such situations 1S 
the diff le to T. The standard error of 
m ifference is, as usual, Vonta? 

iA o,2 is computed by [7]. 
aide we wish to avoid using the sam- 
the « as an estimate of Tin computing 
fone we have recourse toa 
fine k ormation called w, which is de- 
C) h as sin 7, in radians. Kendall 
Bd as shown that the sampling vari- 
ct of w can be maximized at 2/7, 
om alue independent of the parameter 

. The standard error of the differ- 


ence between w and w: can be maxi- 
mized at 


an expression which does not require 
an estimation of population w's from 
the data. 

The w transformation may also be 
used to set confidence limits for a T, 
though there is no reason to feel that 
this would be a desirable practice. 
Limits set in this manner, while dif- 
fering slightly from those determined 
by [8], cannot be said to be more ac- 
curate, since it is not known whether 
the distribution of w is nearer nor- 
mality than that of r. Furthermore, 
the computations involved in convert- 
ing from 7 to w and back again may 
very well exceed those required in 
solving [8] to obtain the limits. 


A COMPLETE COMPUTATIONAL 
EXAMPLE 


Consider the following set of rank- 
ings where the first has been arranged 
in the natural order: 


12 3 oe 7 8 9 10 
6 (8 10° 9.7 'S Qa ei TS 
Computing S, we obtain +4, —5; 
+2, —6;0, —7;0, —6;0, —5;0, —4; 
+2, —1;0, —2;and +1, 0. The total 
for P is 9, the total for Q is 36, and 
S=-27. For the denominator, 
4n(n—1) =1(10)(9) =45. According 
to [1], T= —27/45 = —.60. Entering 
Table 1 with an # of 10, we find that 
ar of .60 is significant beyond the .05 
level. The precise $ value is .0166. 
If we wish to use the normal ap- 
proximation, we require the standard 
error of 7, and we must correct 7 for 
continuity. From [5], we compute 
the variance of r as (0617, and the 
standard error, 248. Applying the 
continuity correction at S, we re- 
compute 7 from [1] thus: (—27+1)/45 
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=—.578. Or, correcting 7 itself; 
—.60+2/(10)(9) = —.578. (Since the 
correction is subtracted from the ab- 
solute value of S or 7, we add it to a 
negative statistic.) The critical ratio 
of r is thus —.578/.248 =2.33, which 
corresponds to a probability of .0198. 
Comparing this value with the prob- 
ability obtained from Table 1, we see 
that the normal approximation is 
slightly in error when 7 is as small as 
10, though it provides a somewhat 
more stringent null test. 

To set the confidence limits of T 
at the .05 level and beyond, we solve 
[8] with >= —.60 and x=1.96. The 
roots of the quadratic are —.93 and 
+.25, which are the limits of T. The 
finding is hardly illuminating, though 
not unexpected. Any correlation 
based on only 10 instances is bound 
to be an uncertain estimate of the 
population value. If we had used [7] 
to compute the limits, we would ob- 
tain a maximum standard error of 
-358 and limits of —.60+.70 at the 
-05 level or beyond. 


PARTIAL RANK CORRELATION 


A procedure for 
tial 7 when there 
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plets in each ranking, i.e., items 1 and 
2, land 3, ...1and, 2 and 3, etc. 
In Judge C's ranking, the order of . 
magnitude of each couplet is the 
same; the one to the right is the 
larger. We determine (a) the number 
of couplets on which both Judge A 
and Judge B agreed with Judge C as 
to order, (b) the number of couplets 
on which both disagreed with Judge 
C, (c) the number on which A agreed 
and B disagreed, and (d) the number 
on which B agreed and A disagreed. 
These frequencies are now arrange 
in an ordinary 2X2 contingency table 
and the partial 7 of the rankings of 
Judges A and B independent of that 
of Judge C is defined as 


E ab—cd d 9] 
VF Nard) (b+ 6+A) 


It so happens that 


TAB-C 


a — OS. tral 
eee Roe 8 a 
an expression which is analogous ba 
that for the product-moment partia 
Correlation coefficient. It happens 
further that Tap. =Vx?/n, whic 
illustrates the relationship between 
Partial + and the phi coefficient. f 
Examples of the computation O 
Partial 7 using [9] can be found im 
Kendall (7) and Smith (12). The lat- 
ter’s example, though correct in form, 
contains arithmetic errors so that the 
computed partial is inaccurate. For 


TABLE 1 


* 
NCE AT THE .10, -05 AND .01 Levets AND BEYOND 


Level n 4 5 6 
10 1.00 0.80 oz 
.05 = 1.00 0.87 
-01 — = 1.00 


* Based on Kendall's (7) tables. 


9 10 
0.62 0.57 0.50 0.47 
0.71 0.64 0.56 0.51 
0.90 0.79 0.72 0.64 


a —— 
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most purposes [10] will be the more 
useful computational method. 

The use of partial r when ties are 
present is questionable since [9] and 
[10] will give different results in such 
instances. This drawback, added to 
the fact that generally applicable 
tests of the significance of any partial 
T are not yet available, limits the 
value of the statistic.® 

An expression for a multiple + has 
been developed by Moran (10), but 
the problems of the sampling distri- 
bution of multiple 7, although ap- 
parently simpler than those of partial 
7, have also not yet been solved. The 
usefulness of multiple 7, like that of 


Partial 7, is limited at the present 
time. 


THE RELATIONSHIP 
BETWEEN 7 AND 7 


When ranked data can be assumed 
to be based on continuous, normal 
distributions and v is fairly large, an 
estimate of the parameter product- 
moment coefficient can be obtained 

y means of a transformation of r. 


The formula for this transformation 
is 


. TT . 
r=sin = (radians) 


M11] 


The significance of the estimated 
r can be tested by simply testing the 
7 from which it was derived, using 
Normal tables and a variance com- 
puted by [5]. 

In the nonnull case, the distribu- 


=sin 90r (degrees). 


° Hoeffding (5) shows that when neither 
Tac nor Tac is unity, the distribution of /7 
pan: Tapo) is approximately normal for 
arge n’s with a mean of zero and a variance 
Even by an expression which he derives. 

urthermore, when Tac and Tsc are zero, 
the distribution of „/n(ran-c—Tas.c) is the 
Same, in the limit, as that of /7 (ras — TaB). 


tion of sample r's will be approxi- 
mately normal for large #’s, with a 
mean of T and a maximum variance 
of 


Sti— 
ia 


maximum g,’ = 


[12] 


Confidence limits for T can be ob- 
tained using this variance, and cor- 
responding limits for the transformed 
r are computed by translating the 
limits of T into those for 7 using [11].7 

A comparison of the upper limit of 
the variance of r by [12] when nor- 
mality is assumed with its upper 
limit by [7] when no assumptions are 
made will show that the assumption 
of normality decreases the standard 
error of 7 by approximately 50 per 
cent in the nonnull case. On the other 
hand, if 7 is used to estimate r when 
the latter could be computed directly 
from the data, there will be a con- 
siderable loss of sensitivity since the 
standard error of the former is always 
greater than that of the latter. The 
ratio of the standard errors will vary 
from 1.2 when the variates are uncor- 
related up to approximately 1.9 when 
the true 7 is .90. 

The conversion formula for r from 
7 is justified only by the assumption 
of normality of distribution of the 
variates, and when is fairly large. 
Otherwise, it would seem advisable 
to avoid estimating 7 from ranked 
data, and to limit the conclusions to 
statements concerning 7. 


7A standard error for r computed from 7 
can be derived using the conversion formula 
(7). Its upper limit is 

y2 —?)(1—1°) z 


n-1 


The procedure for setting limits for r by con- 
verting limiting 7's into limiting 7's is, how- 
ever, preferable because of the greater sym- 
metry of the distribution of 7. 
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THE WATER-JAR EINSTELLUNG TEST AS A 
MEASURE OF RIGIDITY 
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The water-jar Einstellung test was 
first used experimentally by Karl 
Zener and Karl Duncker at the Uni- 
versity of Berlin in the 1920’s. It 
was formally introduced into Ameri- 
ae psychology by Luchins in 1942 
= Ns The test consists essentially 
i a series of simple arithmetic prob- 
ems couched in terms of three water- 
ite each with a known maximum 
ae The S is required to manip- 

ate the Jars so as to obtain a given 
quantity in one of them. No other 
measure except the maximum ca- 
of the jars can be used. The 
Sea oe of problems presented to 
ie. cy solved by filling the largest 
ae = i then emptying it twice into 
fie of the smaller jars, and once into 

other. If the jars are labeled A, 
phon C, the solution, which is the 
iy €st available, follows the form 
Sika i 2C. These problems are 
R: ee | or Einstellung problems 
Stee eir intent is to habituate the 
E i e B—A—2C solution. An ex- 
i set problem is shown below; 
Capacities are noted on each jar. 


1 . 
‘Mgt study was begun while the author 
ae aad pih the Iowa Child Welfare 
Prepa; ch Station. During that time, its 
pi ation was supported by Research Grant 
l pom the National Institute of 
he auth ealth, U. S, Public Health Service. 
or a thor is indebted to Dr. Milton Rokeach 
cal pa Ce and encouragement, and for a criti- 
How 
all ver, the author is solely responsible for < 
Zeneral conclusions stated in this article. 


i 


readin; i : 
g of the first draft of this paper. 


| 50 | | 81 | | 7 | Obtain 17 


A B C 


Immediately after the set problem, 
the S is asked to solve a second group 
of problems for which the habituated 
solution suffices, but which also may 
be solved by a more direct method, 
usually A — C, or A+C. These prob- 
lems are called critical, or test prob- 
lems. Their original purpose was to 
show the effect of the set engendered 
by the Einstellung problems. The 
use of the set solution for the critical 
problems was taken as evidence of 
the establishment of a set. An ex- 
ample critical problem is shown be- 
low. 
| 4] Lett pet 

A B C 


In his 1942 monograph (34) Lu- 
chins added a third type of problem, 
the extinction problem. Extinction 
problems are amenable to solution 
by the direct A-C and A+C 
method, but not by the indirect 
B—A-—2C method. The extinction 
problem was originally conceived as 
an attempt to break the Einstellung; 
its success or failure was determined 
by a subsequent group of critical 
problems. An example extinction 
problem is shown below. 


aoo J Gums | Obtain 26 
B C 


Obtain 12 


Syching tas ronsistently_renarded . 
\ j Bureau Edn], Psy. R 


Davis > Tp 
j AVID Hanë EKAININS COLLEGE 
LLEGE 
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the water-jar test as a paradigm of 
human learning. In 1942 (34) and as 
recently as 1954 (40), he made clear 
that he believes that the primary 
value of the test lies in its educa- 
tional implications rather than in 
clinical use. Nonetheless he ap- 
parently developed some kind of 
clinical index of rigidity from the 
water-jars (37),? although he has 
never published any norms or other 
developmental data. In 1948, Roke- 
ach (47) published the first account 
of the water-jar test as an experi- 
mental rigidity measure. He re- 
ported that there was a relationship 
between the number of short or di- 
rect solutions of the critical problems 
and scores on the California E scale. 
The relationship was in accord with 
certain theoretical considerations in 
which the ethnocentric individual is 
conceived of as having a generally 
rigid Personality structure, 

In Rokeach’s final form of the 
water-jar test, extinction problems 
were not used, and the set solution 
of the critical problem 
the definition of rigidity 


» Rokeach 
added the control Problem, one of 
Luchins’ Variations (34), in his de- 


sign. A control problem į 
which is administered Prior to the 


presentation of the set problems, 
who failed to solve the c 


lem by the short metho 
inated from the experiment. 
expressed purpose of 


s a critical 


The 
this problem 


i onstrate their 
ability to solve a critical Problem by 
the simple method” (47, p. 263). 


The publication of Rokeach’s find- 

? Luchins’ manual has been out of print 
since 1950, and only a limited number of 
copies was ever printed (Cf. 29). The Present 
writer has so far been unable to obtain a copy 
to determine whether developmental test data 
are presented. The test form is Probably 


similar to those Suggested by Luchins else- 
where (35). 
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ings provided the impetus for a om 
siderable amount of experimenta 

work with the water-jar test. It also. 
gave rise to a controversy between 

Rokeach and Luchins (Cf. 36, ae i 
which has not yet been resolved. Ai 
center of the dispute was Rokeach’s 
exclusion of the extinction problem 
from his form of the test. men 
has emphasized (36, 38, 39) that a 
extinction problem jis the Asn 
mechanism to tap rigidity en d 
solution requires a shift of metho! h 
while that of the critical does it 
If rigidity is defined as “the se Seed 
to change one’s set when the ore 
tive conditions demand it,” only ils 
extinction problem logically ae 
the requirements of the na ON 
The contention appears reason 
especially in light of evidence ( á 
that Ss who use the short C T 
solution do not work any faster t! z 
those using the indirect pount 
However, the controversy cannot A 
satisfactorily settled by cen 
alone. Logic is a secondary consic® i 
ation when experimental investi8? 

tion is possible. 

The primary purpose of the pres 5 
Paper is to examine the ogee 
the water-jar test as a rigidity aS ie 
ure by critically reviewing studies 1" 
volving its use as such an index. 
is hoped that the controversy 
tween Luchins and Rokeach wi 
resolved along the way. ith 

Since this review is concerned hai 
the water-jar test as a rigidity D 
ure, a number of studies (6, 28, e 
41, 49, 57, 58) in which the test po 
manipulated to investigate ma j 
fect on Einstellung, but in whic ie 
was not used as a rigidity measur® 
will not be considered here. 


ent 


be- 
I be 


TEST VARIATIONS Fe 
A . st 
The expression “water-jar me 
has been used thus far in its gen ef 
sense, for there are actually a num 


aii ee el 
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of different experimental forms of the 
test. We can distinguish four basic 
types which differ among themselves 
with respect to the kinds of problems 
which make up the form. The sim- 
plest of these we have labeled the 
Zener-Duncker form } it consists of a 
series of set problems followed by a 
series of criticals. The Luchins form 
consists of sets, criticals, and extinc- 
tions. A modification of this form 
adds a series of criticals following the 
extinctions. The Rokeach form has 
a control critical problem followed 
by sets and criticals. The Cowen 


measures are listed below in the form 
in which a high score indicates rigid- 
ity. The individual experimenter 
may compute the opposite or non- 
rigidity score. Each measure is fol- 
lowed parenthetically by an appro- 
priate abbreviation which will be 
used to represent it in the remainder 
of this paper. 

1. Number of critical 
solved by the long, or 
method (Cr). 

2. Number of failures to solve ex- 
tinction problems (£x). 

3. Number of long critical solu- 


problems 
indirect 


TABLE 1 
SEQUENCE OF TYPES OF PROBLEMS IN FORMS OF THE WATER-JAR TEST 


Form Bins aa Luchins Rokeach Cowen 
i a aaa 
sets sets sets control control control 
criticals criticals criticals sets sets sets 
extinctions extinctions criticals criticals criticals 
criticals extinctions set 
criticals extinctions 


criticals 


form consists of the modified Luchins 
form preceded by the control critical. 
A modification of this form has a set 
problem inserted between the first 
criticals and the extinctions. The 
sequence of the various forms are 
shown in Table 1. A few unlisted var- 
lations have also been used occa- 
sionally. 

In addition to the different forms 
of the test, there are also various 
Operational measures of rigidity which 
are derived from the forms. 

Most of the water-jar studies use 
one or more of four primary meas- 
ures. A fifth measure, time of solu- 
tion of a problem, can be applied to 
any of the four, though in practice 
it is usually an extinction. These 

* The forms are identified by the name of the 


Person who first used each one, as nearly as 
can be determined from the literature. 


tions and number of extinction fail- 
ures pooled into a single score (CrEx). 

4. A specified number of long criti- 
cal solutions and extinction failures 
used as a multiple cutoff point to 
distinguish a “rigid” group of Ss 
(Cr+£Ex). 

Considering combinations of form 
and measure, nine different opera- 
tional definitions of rigidity (exclu- 
sive of type of administration) based 
on the water-jar test have been used 
in correlational studies. Two of 
these definitions involve time meas- 
ures. Since the number is small, 
these will be included with their re- 
spective form-measure combination 
in compiling results, leaving only 
seven definitions. 

The volume of rigidity studies is 
too limited to furnish a conclusive 
evaluation of the predictive validi- 
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ties of so large a number of defini- 
tions. However, there are sufficient 
studies to permit at least an inspec- 
tional comparison. To facilitate this 
comparison, thirty-one investigations 
of the relationship between the water- 
jar test and criterion indices have 
been classified according to signifi- 
cance of results. A study is classified 
as positive if more than 75 per cent 
of the reported correlations were sig- 
nificant at the .05 level or beyond. 
If less than 25 per cent were signifi- 
cant, it is classified as negative. The 
remaining studies are considered am- 
biguous. 

A breakdown of the significance 
of results as a function of form, meas- 
ure, and method of administration, 
each independently, is shown in 
Table 2. A similar breakdown ac- 
cording to combinations of form and 
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measure is given in Table 3. Studies 
making up the frequency in each 
category of Table 3, and in the form - 
analysis of Table 2, are listed paren- 
thetically alongside the frequency. 
The frequencies in both tables are 
obviously too small to warrant a 
statistical analysis, but inspection 
suggests that no particular form, 
measure, type of administration,* Or 
combination of form and measure 18 
superior to any of its fellows. This 
cautious conclusion derives addi- 
tional support from the fact that two 
of the three positive experiments us- 
ing the Rokeach form (53, 54) stem 


t Experience with the WJT suggests that it 
is quite sensitive to various conditions of ad- 
ministration, especially the instructions t 
Ss. However, these conditions are not spec 
fied in many of the studies, so that evaluation 
of their effects is not worth attempting. 


TABLE 2 
BREAKD 
OWN OF RESULTS ACCORDING To Fors.* MEASURE AND ADMINISTRATION 


oe Frequency of Total 
se ° 
Positive Results Ambiguous Results Negative Results = 
Zener- 
D 
Pa Pia 368, 21,31) 3 (20, 43, 59) p 
Rokeach 3 (47, 53, 54) Lag? 774248) 7 @, 4, 23, 26, 30, 32, 39) 5 
owen 1 (16) 1 (25) 34) 10, $0 2 
Total 5 10 16 l a 
3 
Measure 
E 3 5 9 17 
e 1 1 
CrEx 2 2 rid $ 
Cr+Ex 1 0 3 i 
Total 7 8 20 35 
2 35 
Administration 
Group 1 6 
Individual 1 2 S 
Not Stated 3 2 3 7 
Total 5 10 16 31 
3 


ee  ————- — 
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a TABLE 3 
REAKDOWN OF RESULTS ACCORDING TO COMBINATIONS OF FORM AND MEASURE* 


Frequency of 


Combination = 
Positive Ambiguous Negative Total 
Results Results Results 
L (C 
J Cko 0 0 3 (1, 234) 3 
L (ay i (a2) 2 ar, 42) 2 (4, 26) 5 
L ‘ 4 (23, 30f, 32, 42 
(Cr+Ex) 1 (50) ù 3 get , 42) s 
3 = E He = 
otal 3 3 11 17 
C (C 
E eee ? (16) 105) ? : 
C (Cr+Es) 0 0 ign 19 i 
Total q T 3 Te 
5 
Z-D (C; 
R (Cr) "9 0 3 (8, 21, 31) 3 (20, 43t, 59) 6 
3 (47, 53, 54) 1 (15) 3 (22, 27, 55) * 7 
Over-all Total 7 3 20 35 


Note 


L =Luchins 
tudies ae fort, C =Cowen form, Z-D =Zener-Duncker form, R =Rokeach form. 
g the frequency in each category are shown parenthetically. 


WO separate : i i 
Cr measures in this study; the Cr's preceding an Ex, and those following the Ex. 


time measure, 


fr . 3 
eal investigation (52). This 
the pete dissertation also swells 
pe ive results category for the 
under ¢ ure, and for “Not Stated” 
i — of administration. 
and 8 rips of the data of Tables 2 
expe a reasonable as well as 
ational pu regard the various oper- 
water ta efinitions based on the 
variation tests as equivalent. Test 
istussion will hence be ignored in the 
Paper a in the remainder of this 
Vestigatio results from different in- 
aes will be pooled, when 
Sam Y, regardless of differences 1n 
c and measure. 
in A vs. extinctions. The data 
es 2 and 3 also provide a 


5 Th; 

equiva, toes not mean that the forms are 

ances furnish every sense. Means and vari- 

Sarily the ished by the forms are not neces- 

Predictiveness. 4 The equivalence is one of 

rather th ness (or lack of predictiveness) 
an of descriptive data. 


means of evaluating Luchins’ con- 
tention that the Ex problem rather 
than the Cr is the proper unit for the 
measurement of rigidity. Only one 
of six instances of the use of Ex fur- 
nished a significant relationship, the 
lowest percentage of any of the four 
measures. It is represented by a sin- 
gle chi square based on a small group 
of Ss. If we include the Cr-+-Ex meas- 
ure® the ratio becomes two of ten in- 
stances, or 20 per cent compared with 
18 per cent for Cr and 25 per cent for 
CrEx. A total of three out of twelve 
individual correlational analyses us- 
ing Ex and Cr-+-Ex measures at- 
tained the .05 level of significance, 
and one of these (50) is a questionable 
result. As we shall see, this propor- 
tion is about the same as that of sig- 


6 Luchins (39) has recently stated that the 
Cr+Ex measure should be a satisfactory 


rigidity index. 


——_— 
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nificant correlational analyses with all 
forms (see p. 352, below). Evidently, 
the extinction problem is no more pre- 
dictive than the critical problem 
when both are evaluated against in- 
dependent criteria. 

The experiments of Guetzkow (23) 
are usually cited as experimental evi- 
dence that the Cr and Ex are differ- 
ent measures. He reported that men 
and women Ss perform similarly on 
the Cr's, but that a significantly 
higher proportion of men solved the 
Ex. On this basis, Guetzkow sug- 
gests that there are two different proc- 
esses and causal factors involved; 
the Cr is concerned with acquisition 
of set, while the Ex is involved with 
surmounting the set. The data them- 
selves offer no real basis for such a 
notion. They merely show the dif- 
ferential performance of the sexes, 
Seventy-eight per cent of the Ss who 
used the short Cy solution also solved 
the Ex. Guetzkow feels that this 


lends weight to his different process 
idea since there w. 


problems. A 
tween the two 
Ported elsewhere (32). 
can hardly lead one to conclude that 

and verified distinc- 
t I e processes involved 
in solving the two types of problem 
Furthermore, median test analyses of 
Harris’ data (24) show that when 
time of solution is the Measure, there 
are no differences between males and 
females on either Problem. In any 
event, the data of the correlational 
analyses indicate that the Ex is no 
better than the Cr as a unit for the 
measurement of rigidity, no matter 
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what other hypotheses one might 
care to entertain. 


THE WATER-JAR TEST AS A 
RIGIDITY MEASURE 


In addition to suggesting the 
equivalence of the sundry definitions 
based on the water-jar test, the data 
of Tables 2 and 3 also indicate that 
the water-jar test (WJT) lacks pe 
dictive validity. Sixteen of the thirty- 
one studies report negative results. 
An additional ten have ambigo 
findings. Only five can be consideret 
as supporting the claims for the a 
as a rigidity measure, and it must P 
remembered that two of these are off- 
shoots of a single dissertation (52). 

The number of individual gareel 
tional analyses varies considerably 
among the 31 studies. Eleven studies 
have only a single correlation, bu 
Some report as many as 30 or more: 

here is a total of 202 individua 
analyses of the relationships between 
the WJT and other tests. These a 
volve 111 measures derived from 
different instruments, exclusive 2 
tests of intelligence. Of the 202 a 
relations, 151, or 74.75 per cent el 
not reach significance at the .05 lev 
or beyond. If we allow for the a 
Correlations which are probably aa 
nificant by chance alone, the percen s 
age of insignificant correlations oie 
to about 80, roughly the same as t > 
Percentage of negative and ambigt 
ous studies shown in Table 2. -ai 

he high proportion of insigni s 
cant correlations, like the similar Pr 
portion of negative and ambiguors 
studies, indicates that the WJT lac ð 
criterion validity. It is pertinent i 
inquire, however, whether any Pa 
ticular criterion test has been Te 
to be more consistently related to gy 
WJT than others. Evaluation ! 
hindered by the fact that only aa 
the 66 criterion tests have been i? 


a 


Di. 
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vestigatedinreplicatedstudies. None- 
theless, it may be revealing to ex- 
amine the relationship of the WJT 
and individual criteria. 

The California scales. Of the 12 
tests in replicated researches, the Cal- 
ifornia E and F scales have been the 
objects of most attention, probably as 
a joint result of the use of the E scale 
in Rokeach’s provocative study (47) 
and the popular theory linking rigid- 
ity and the antidemocratic person- 
ality. In view of the usually high cor- 
relation between the two scales, they 
will be considered as one measure for 
the analysis of this section. 

There are nine studies (4, 8, 17, 20, 
27, 31, 32, 47, 59) in which either the 
E or the F scale has been used as a 
criterion measure. A total of 1,088 
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Ss have been involved. The results 
of these investigations are summar- 
ized in Table 4. 

There are 18 individual correla- 
tional analyses in the studies.” Fif- 
teen of these are correlation coeffi- 
cients of various sorts while three are 
t tests of differences between mean 
WJT scores for ethnocentric and 
nonethnocentric groups, or between 
E- or F-scale means for rigid and non- 
rigid groups on the WJT. Of the 18 
correlational analyses, only five reach 
the .05 level of significance or be- 
yond. Assuming that the various co- 
efficients are equivalent estimates, 


7Results for a group of children in 
Rokeach’s study (47) are not included here in 
the interests of uniformity of research popula- 
tions. 


TABLE 4 


REPORTED RELATIONSHIPS BETWEEN THE WJT AND THE CALIFORNIA SCALES OF THE 
ANTIDEMOCRATIC PERSONALITY Î 


Correlational 


Special Experimental 


Sids N Technique Conditions Result 
4 29 tau week after stress —.15 
26 day after stress .34* 
24 none 24 
8 80 r none -00 
82 ego-involvement .40** 
Uy 33 r none .06 
20 50 Tvis none -06 
50 stress —.10 
ah 50 Toia none = .23 
31 20 t none +significant 
37 reward incentive nonsignificant 
16} rho none —.13 
23ł reward incentive .09 
32 29 tau none —.18 
47 70 t none +significant 
59 262 r mild stress = .03 
135 none 07 
72§ none 30 
Average of correlation coefficients ay 
Average of all analyses (see text) = 


verage of all analyses of data obtained under special conditions 


X Significant at the .05 level. 
Significant at the .01 level. 
he signs of some correlations have been changed 


so that a positive correlation always indicates that WJT 


Tigidity scores and authoritarianism scores vary in the same direction. 
nly Ss with WJT scores greater than zero included. 


group of female Naval officers. 


354 


the average of the 15 is .04. If we as- 
sume further that an insignificant ¢ 
equals a coefficient of zero, and that 
a significant ¢ equals a coefficient of 
-40, the average correlation becomes 
-07. Evidently the WJT and the Cal- 
ifornia scale of the antidemocratic per- 
sonality are not related indices. 

The data in Table 4 also provide a 
means of testing Brown’s (8) hy- 
pothesis that a significant relation- 
ship between authoritarianism or 
ethnocentrism and the WJT isa func- 
tion of stressful or ego-involving ex- 
perimental conditions. There are 
seven correlations (4, 8, 20, 31, 59) 
computed from the data of 509 Ss 
who performed in this type of atmos- 
phere. Only two of these are signifi- 
cant, one being furnished by Brown 
himself. The average using the crude 
conversion from ¢ to r as in the previ- 
ous paragraph is only .05, The aver- 
age for the remaining 11 correlations 
is .08. (The respective averages with- 
out the converted fs are .06 and .02.) 
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ures other than the E and F scales. 
Of 28 such correlational analyses 


only one is significant at the .05. 


level. Apparently, the data will not 
bear out a conclusion that the WJT’s 
lack of criterion validity is a func- 
tion of experimental circumstances. 

The Rorschach. Four studies have 
been concerned with the relationship 
between the WJT and various Ror- 
schach indices. A total of 25 such 
indices have been used in the studies; 
however, only 12 will be found in 


Teplicated studies. Of these, 11 (high 


F%, low R, high A%, high F+% 
low FC, high W, high Dd, low M, low 
FY, low content range, and slow reac- 
tion time for first response to each 
card) comprise the Fisher Rigidity 
Score. In the Applezweig (4) and 
Katz (27) reports, the Fisher score is 
given as a unit, Both report nonsig- 
nificant relationships with the wit. 
Cowen and Thompson (15) used 8 © 
Fisher's 11 measures with child 55) 
and found that 4—R, content range 


time of first response, and F+% were 
related to the WJT. 


Both Katz and Cowen and Thomp- 
son found a significant relationship 
between the WJT and u+c. Of 8 
indices, this is the only one found tO 
be related by Katz. Those which he 


EPORTED IN SIXTEEN WATER-JAR 


Standard 


Percen tage of Ss 


: of 5s 
Ost in Studies Percentage 


Lost from Total 


Using This 
Standard Sample 
Set solution of a requisite number of 
set problems 


Short (or long) control solution - a 24.97 9.27 
Arithmetic accuracy 119 say Ta 
Pooled standards* 113 22.54 re 
24.84 4.7 
Total Loss pon ——— 
634 
26.58 


DEN 
me — 
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reports as unrelated include the 
Reichard Prejudice Score, the Gibby 
Stability Score, and judges’ ratings 
of inflexibility and emotional con- 
struction. Cowen and Thompson 
used 18 different indices of which 8 
Correlated significantly with the WJT. 
Sal the fact that their sample 
isted of children tends to vitiate 
m parisons with the other studies. 
is te a was reported to be related 
sh e : JT by Cowen and Thomp- 
i and by Eriksen and Eisenstein 
). Katz, however, failed to con- 

rm these findings. 
hed measures in replicated studies. 
ler-Bello eee between the Wechs- 
She Wire Similarities subtest and 
tres: has been investigated by 
oe (39) and Horwitz (26). The 
the Pi er a negative result while 
ton er gives a significant correla- 

of — 30, 
(azo zman, Fox, and Morrisett 
Res a 3 French (20) each made 
alor Ae of the WJT and the 
analysen oe Scale. Of the four 
significant nly one of Maltzman’s is 
i the Alphabet maze has been used 
= e ien Cowen, Wiener, and 
while Hy report a significant r of .42, 
study ce correlation in Vallance’s 
E) re 9) is insignificant. Bakan 
itie o a significant r of .26, but 
dividu a result of averaging four in- 
DA coefficients which are not in- 
estimate and is therefore a biased 
vidual e. Only one of the four indi- 
ar coefficients is significant. 

and W (27) and Schmidt, Fonda, 
Wesle esley (50) have examined the 
the WI Rigidity Scale in relation to 
antr Katz found an insignifi- 
Perime ationship, while the latter ex- 
re A a claim to have found a 
amaha ai However, their data 
vided S is questionable. They di- 
a group of Ss into three sub- 
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groups on the basis of WJT scores, 
minimally, equivocally, and maxi- 
mally rigid, and compared mean 
rigidity scale scores by ¢ tests. Only 
one of the three ¢ tests, that between 
the minimally rigid and the maxi- 
mally rigid groups, was significant. 
This ¢ test furnishes the basis for the 
claim that a relationship exists. In 
designs of this type, the proper pro- 
cedure is first to compute an analysis 
of variance of the scores of all three 
groups. If a significant F does not re- 
sult, significant individual ¢ tests can 
not be regarded as indicating real dif- 
ferences. The over-all F for the 
three groups in the Schmidt et al. 
study is 2.89, which falls short of the 
.05 level. Therefore the ¢ test upon 
which the relationship claim is based 
is specious, and the study must be 
regarded as having essentially nega- 
tive results. 

Oliver (45) and Horwitz (26) found 
no relationship between the WJT 
and mirror writing of letters and 
words. 

Horwitz (26) and Eriksen and 
Eisenstein (17) related the WJT to 
performance on reversible figures. 
Both studies used the reversible 
staircase and the Necker cube. Hor- 
witz also used the reversible profile. 
In the Eriksen-Eisenstein work, per- 
formance on the two figures was 
grouped into a single score. None of 
the correlations are significant except 
for the Necker cube in the Horwitz 
experiment. However, the coeffi- 
cient of .30 is in the opposite direction 
from what would be expected if the 
WJT was measuring rigidity. 

‘Applezweig (4) and Horwitz (26) 
computed correlations between the 
WJT and the Hidden Words Test. 
None of four individual correlation 
coefficients is significant. 

Three different measures derived 
from Maier’s two-string problem 
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have been used as criterion indices. 
Adamson and Taylor (1) obtained 
one significant chi square of three in 
attempting to relate the WJT to a 
“functional fixedness” score. Guetz- 
kow (23) reports two insignificant 
analyses, one based on a “stereo- 
typy ratio” and the other on correct- 
ness of solution of the two-string 
problem. 

The relationship between the WJT 
and level of aspiration measures on 
the Rotter Board has been investi- 
gated by Horwitz (26) and Harway 
(25). The latter employed 11 differ- 
ent measures, and analyzed differ- 
ences in both means and variances 
for a rigid and nonrigid group differ- 
entiated by the WJT. Two of Har- 
way’s measures, number of unusual 
shifts of estimate (i.e., up after fail- 
ure, down after success) and the ab- 
solute discrepancy between estimates 
from trial to trial, were replicated 
by Horwitz. His correlations are 
both insignificant, but Harway's ¢ 
for the second 
-05 level of significance. Of Harway’s 
11 measures, fo 
cant mean and 
In addition toa 
between estimates, 


sures from the 
om the Hidden 
e were only two 
fferences for the 


Ther 
significant mean di 
WJT, and only one of the Hidden 
Words, a total of 7 of 33 mean dif- 
ferences derived from the three tests. 
However, there were five significant 
variance ratios from the WJT, and 
seven from the Hidden Words, a 
total of 16 for the three instruments. 
Of the seven mean differences, six 
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also have significant variance dif- 
ferences. Variance differences are 
difficult to interpret, especially in 
the absence of accompanying mean 
differences. Certainly there is no 
particular theory or hypothesis a 
cerning personality rigidity w ait. 
would easily encompass variance di 
ferences. Such differences may have 
a real theoretical meaning, but it 
would be unduly optimistic to 7 
that they reflect favorably i 
validity of the WJT as a rigi = 
measure, especially since the anes 
statement obviously cannot be ma 
for the mean differences. d 
There are a number of tests a 
in WJT investigations which are ak 
found in replicated studies, but poe 
may be grouped under certain us fs 
headings. Four of these are what wits 
commonly regarded as tests of na 
cept formation. Forster, bai! 
and Digman (19) found no sae tl 
ship between the Vigotsky and ing 
WJT, and between another eS 
test and the WJT. Katz (27) ae 
a similar insignificance for the (54) 
consin Sorting Test. Solomon’s ts” 
“organization of biology concep 
scale did relate to the WJT at the: 
level. ith 
Several investigations deal W T 
the relationship between the pua 
and emotional adjustment. Co’ Jae 
and Thompson (15) found no pe 
tionship between the WJT and the 
Bell Adjustment Inventory of h 
California Test of Personality, ner 
again, the use of a child populati , 
precludes comparison with other s o 
ies. Ainsworth (2) derived an, op 
justment score from a security-!" i 
curity inventory, which turned pe 
to be unrelated to the WJT. Mer 
(43) results with the Maslow in 
rity-Insecurity index were also neg ) 
tive. Horwitz (26) and Levine 
compared groups of normals 4 
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psychiatric patients. In neither case 
were any significant differences in 
WJT scores obtained. 

Three studies made use of abstract 
reasoning tests. Insignificant correla- 
tions with the WJT were reported by 
Forster et al. (19) for the Duncker 
reasoning tasks; the matchbox, cork, 
X ray, and “13.” Solomon (55) found 
that the WJT was related to re- 
Sponses to only one of four science 
questions after a laboratory course 
designed to overcome the common 
misconceived answers. Sivers (51) 
apparently did not find a relation- 
ship between the WJT and Form A 
of the Abstract Reasoning Test; the 
data are not presented in sufficiently 
clear form to be certain.’ : 

Two experiments dealt with instru- 
ments which may be thought of as 
Measuring perceptual intolerance of 
ambiguity. Relationships between 
the WJT and the Mooney-Ferguson 
Closure Tests I and II, the Frenkel- 
Brunswik Changing Figures Test, 
and the Levy Design Preference Test 
Were computed by French (20) for 
two groups. None of the eight coef- 
ficients is significant. Eriksen and 
Eisenstein (17) found that the WJT 
Was related to “availability of hy- 
Potheses,” i.e., the number of guesses 
as to the identity of objects shown 
tachistoscopically at subrecognition 
Speeds, 

Other perceptual tasks included 
the Angyl dots (4), Hidden Objects 

), and speed of recognition of ta- 
chistoscopically-presented words pre- 
ceded by erroneous expectancy (17). 

| Seven correlations, only one— 
Hidden Objects—is significant. Two 


= The lack of clarity in no way reflects on 
ie abilities. A study of the relationship 
etween the WJT and the ART was not part 
of his design. The present writer has esti- 
mated the degree of relationship from data 
Presented by Sivers for other purposes. 
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of the other correlations of the WJT 
with Hidden Objects are not signifi- 
cant. 

Motor and perceptual tasks. A num- 
ber of motor or perceptual-motor 
tasks have been used in the WJT 
studies. These include mirror writ- 
ing, word construction, figure simi- 
larities, maze tracing, code decipher- 
ing, arithmetic speed, etc. The AAF 
Aviation Psychology Program Re- 
search Report No. 5 (21) lists seven 
“change of set tests” which fall into 
this group. Three of the seven were 
significantly related to the WJT, but 
the highest of the three coefficients 
was only .18. Three of five motor 
tasks in Oliver's battery (45) were 
related to WJT scores; the highest 
coefficient was .25 for the Gottschaldt 
figures. None of the three tasks used 
by Horwitz (26) was found to be 
related to the WJT. A contour-draw- 
ing test (19) was also unrelated. 

Miscellaneous measures. Rela- 
tionships reported in unreplicated 
studies, or in studies which do not 
fall into usual groupings, are of less 
import in evaluating the WJT. How- 
ever, a number of such attempts are 
listed here for purposes of complete- 
ness. 

Eight Thurstone scales were ad- 
ministered to a group of Ss also per- 
forming on the WJT by Goodstein 
(22). None of the correlations was 
significant, the highest being only 
.03. Goodstein also found a similar 
absence of association for an ana- 
grams test, and for the Shipley-Hart- 
ford Retreat Scale. Peer ratings were 
reported to be unrelated by Vallance 
(59). Decision time as measured by 
the Festinger-Wapner test did not 
distinguish rigids and nonrigids, ei- 
ther among normals or psychiatric 
patients (30). Cowen (14) found 
that the WJT failed to discriminate 
high and low “negative self-concept”’ 
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scorers on the Brownfain Self-Rating 
Inventory. ; 

On the positive side, 20 of 33 items 
in Solomon’s “aspects of scientific 
method” scale (53) successfully sep- 
arated high and low scorers on the 
WJT. Brown (8) found that the WJT 
was related to n Achievement under 
ego-involving conditions, but not in 
an ordinary experimental situation. 
Solomon (56) reported that stutterers 
tended to show more WJT rigidity 
than nonstutterers. 

Measures of intelligence. Luchins 
reported in his 1942 monograph (34) 
that there was no relationship be- 
tween WJT scores and intelligence. 
No correlation coefficients or other 
statistical demonstrations are pre- 
sented. He based his conclusion on 
the fact that differences in Einstel- 
lung effect varied only slightly among 
groups of different ages and educa. 
tional levels. Such variation as did 
occur was attributed to “differences 
in attitudes towards and interpreta- 
tions of their tasks and instructions, 
rather than sheer differences in age 
or educational level” (34, p. 19), 

However, the fact that there is no 
correlation between mea 
groups and intelligence does not pre- 
clude the possibility of significant 
correlations within groups. A more 
objective evaluation can be obtained 
from the results of 12 studies in which 
the relationships between the WJT 
and seven different measures of intel- 
ligence are reported. The Cowen 
and Thompson work (15) with chil- 
dren is again considered separately, 
They report no relationship with the 
Pintner General Abilities est. 
Rokeach (47) found no association 
between either the Stanford-Binet 
or the Wechsler and the WJT in an 
adolescent group. Absence of rela- 
tionship is reported by Applezweig 
(4) for the Navy GCT, and by Hor- 


n scores of 
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witz (26) for the Wechsler. Horwitz 
administered only three subtests, 
Arithmetic, Comprehension, and 
Similarities. The total score is unre- 
lated to the WJT, but the latter wwa 
subtests have significant individua 
correlations with the WJT. Katz 
(27) found no relationship between 
the WJT and a composite score On 
Iowa Entrance Examination tesi 
French (20) found a similar lack © 
correlation for the AFQT. T 
Five studies involved the WJ 
and the ACE. Four of these (5, 7, 
43, 45) report significant negative HG 
lationships between the two (high 
rigidity, low intelligence). The re- 
maining study (53) did not find à 
relationship. Vallance (59) found lov 
but significant correlations between 
the WJT and academic grades in €” 
gineering and navigation obtaine 
by students at a Navy OCS. n 
Again assuming that all correlatio” 
coefficients are equivalent estimate t 
the average correlation based 1 
1,218 Ss in nine studies is “a 
Since no coefficient is reported ‘a 
Benedetti and Douglas (7), a 
findings are not included. It may °° 
reasonably assumed that the inc hé 
sion of their result would raise t 
average coefficient to about =e T 
small portion of the variance of WJ 
scores is thus probably a function ©" 
intelligence. However, this conc! F 
sion should be viewed with cautio! 
since the correlation of —-2 b- 
largely a result of relationships ° 5 
tained with the ACE, especially 5 
“Q” subtest. Other tests yielden 
mostly insignificant results, thous? 
almost all were in the right direct 
We conclude that no partie 
test or type of test except tests of ! 
telligence, appear to be consistent 
or clearly related to the WJT. 7 ht 
conclusion must be tempered in lig 
of the multiplicity of instrume? 


a meee 
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used and the lack of replication. It is 
most particularly applicable to the 
California scales, which have long 
been considered as a rigidity criterion 
for performance measures. It ap- 
pears to be more or less applicable to 
the Rorschach, to tests of concept 
formation, to emotional adjustment, 
to reasoning tests, and to various per- 
ceptual and motor tasks. 


Factor-ANALYSIS STUDIES 


Factor analyses of batteries of 
tests including the WJT were per- 
formed by Horwitz (26) and Oliver 
(45). Apparently such an analysis 
was intended for the “change of set” 
tasks in the AAF program (21), but 
the plan was abandoned when only 
seven of the 28 7’s turned out to be 
significant, the largest being only .18. 

Horwitz’ battery included nine 
tests. Separate analyses were done 
for the normals and psychiatric pa- 
tients. The results in each instance 
were much the same. A problem- 
solving rigidity factor was heavily 
loaded with IQ, leading Horwitz to 
conclude that low intelligence is “an 
important determinant in problem 
solving rigidity” (26, p. 70). Hor- 
witz also derived a ‘‘strength of set” 
measure from the WJT by interview- 
ing Ss to determine the method used 
in solving the water-jar problems. He 
grouped responses under four head- 
ings ranging from those which tended 
to establish the strongest set to those 
which led to the weakest. The 

strength of set” measure was in- 
cluded in another factor in which 
arithmetic ability had a heavy lead- 
ing. Hence Horwitz concludes that 
poor arithmetic skill accounts for the 
establishment of a weak set in the 
WJT. 

P Horwitz’ general conclusion is that 
the Einstellung tests appear with 
strong loadings on the intelligence 


factor but fail to cluster with any 
of the other rigidity tests” (26, p. 
97). His findings are weighted further 
by his use of the Wechsler as an IQ 
measure rather than the ACE. 

Oliver included ten measures in his 
battery, of which five were motor or 
perceptual-motor tasks which he had 
developed, three were ACE subtests, 
and the last was the Gottschaldt fig- 
ures. Of the three factors extracted 
by Oliver, the WJT contributed only 
to General Reasoning Ability, a fac- 
tor composed mostly of the ACE 
tests. It had a slight negative weight- 
ing for the Disposition Rigidity fac- 
tor. Oliver concludes that if his Dis- 
position Rigidity factor is validly 
labeled, then the WJT does not meas- 
ure this characteristic. However, as 
in the Horwitz analysis, the WJT 
appears to be clearly involved with 
intelligence. 

The factor analyses of Horwitz 
and Oliver support the results of the 
correlational studies of the WJT and 
intelligence tests, and lend credence 
to the hypothesis that scores on the 
WJT are in part a function of intel- 
ligence. 


PERFORMANCE ON THE WJT 
UNDER STRESS 


A number of studies attempt to 
demonstrate the validity of the WJT 
as a rigidity measure without the use 
of a criterion test. The basic design, 
reasoning, and intent of these studies 
is relatively uniform. The hypothe- 
sis under examination is that rigidity 
will increase as a function of stress. 
The Ss who perform under conditions 
of ego-involvement, anxiety, frustra- 
tion, anticipated failure, and so forth, 
will manifest a greater frequency of 
long solutions to the water-jar prob- 
lems than individuals to whom the 
test is administered under nonstress- 
ful circumstances. If the experi- 
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mental results are in accordance with 
the hypothesis, it is customary for 
the experimenter to accept his re- 
sults as evidence that the WJT isa 
valid measure of rigidity. The logic 
of this conclusion will be discussed in 
a subsequent section of this Paper. 
For the moment, we are concerned 
with results obtained in experiments 
of this nature. 

Investigations of the effects of 
stress on the WJT scores have been re- 
ported by Christie (10), Harris (24), 
Pally (46), and Cowen (12, 13). The 
studies of Christie and Harris are 
practically replicates; both used the 
same design and the same WJT meas- 
ure, time required to solve a single 
Ex problem following a series of sets 
anda Cr. Christie found that 15 frus- 
trated Ss took a mean of 157.66 sec- 
onds to complete the Ex while a 
like number of unfrustrated Ss re- 
quired only 69.87 seconds on the 
average. The critical ratio of the dif- 


Variance suggested 
x (11), we find 
T significance at 
' and the differ- 
S groups is n 

1 On the other ee re 
distributions of scores are skewed 
a fact mentioned by Harris as lead. 
Ing to his use of a log transformation 
of the data. If we apply a median 
test to Christie’s data, we obtain a 
chi square of 4.80, significant beyond 
the .05 level for 1 df. 
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The facts are more or less reversed 
for Harris’ data. Using the log trans- 
formation, he finds that 18 Ss in the 
stress group required 202.94 log sec- 
onds to solve the Ex, while the non- 
stress Ss needed only 55.56 log sec- 
onds on the average. The ¢ is given 
as 2.59, which is significant at the .01 
level. However, analysis of the raw 
data using a median test results in a 
chi square of 2.78 which falls short 
of the .05 level of significance. ssi 

The interpretation of Christie's 
and Harris’ results is thus wide open, 
with the Particular choice depending 
in large part upon which statistical 
analysis the interpreter wishes to 
credit. Certainly neither study mer- 
its the unqualified citations which 
they have received in later publica- 
tions. 

Pally's study (46) is cumbersome, 
Poorly reported, and difficult to eval- 
uate. He divided his Ss into four 
8roups of 20 each, Groups A and B 
experiencing failure on tests preced- 
ing the WJT administration, Group 

experiencing success, and Group 

being neutral, The WJT had 10 
Cr’s followed by an Ex. Once an 
Succeeded in solving a Cy by the short 
method, the experiment ceased for 
him. Pally Proceeded to compute 
four measures for the groups: time 
required to solve the first Cr by the 
short method (if one was solved), 
the mean number of Cy’s solved, the 
number of Ss having to solve the Ex, 
and the mean time required for the 
Ex solution, It is obvious that these 
measures are not independent of one 
another. There js almost certain to 
be a marked relationship between 
the number of Cr’s attacked by the 
Sand time involved in the first short 


solution. Similarly, the number of 
ES ters be related to the number of 
S 


avıng to do the Ex, and so on- 


Hence the significance of at least the 


last three measures js likely to be un- 
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clear, especially if the results are con- 
flicting. 
. Pally notes that he analyzed the 
time for the first Cr scores and the 
number of Cr's by an analysis of 
variance. However, no F ratios are 
reported. A table of p values based 
on ¢ tests and chi squares is given, 
but the over-all analyses are absent. 
The various p values show that 
Groups A and B did not differ on any 
of the four measures. The same is 
true of Groups C and D. Group A 
differed from Groups C and D on 
three measures each. Group B dif- 
fered from C and D on two measures 
each. In each instance, the measures 
providing significant differences are 
not the same for C and D. Both 
Group A and B differed from Group 
on mean time for solution of the 

Ex, but neither differed significantly 
from Group D. However, these find- 
ings are not directly comparable to 
those of Christie and Harris since 
ony 30 of Pally’s 80 Ss reached the 
Xn 

Cowen (13) divided his Ss into 
three groups of 25 each. One group 
Was subjected to ‘mild stress” prior 
to the WJT administration, a second 
Sroup to “severe stress,” while the 
third group was a control receiving 
no stress. He recorded the number 
of long solutions, the time of response 
to all problems, and the time of re- 
Sponse to an Ex. In each case, the 
means show a clear trend from fewest 
long solutions, and shortest response 
times for the control group, to most 
and longest for the severe stress 
group, with the mild stress group in 
etween, The three F ratios are 
highly significant. In a corollary 
study, Cowen (12) contrasted a stress 
group and a “praise” group. Signifi- , 
cant differences were obtained for 
number of long solutions and for time 
to solve an Ex. The difference in time 
of solution of all problems was not 


significant. Cowen concludes that 
“less rigid behaviors were noted in 
the ‘praise’ group, presumably as a 
function of the anxiety-reducing ef- 
fects of E’s praise and reassurance” 
(12, p. 427). 

This conclusion deserves some fur- 
ther consideration in light of the pre- 
vious Cowen study (13). In that 
study, a neutral control group had a 
mean of 1.20 long solutions. The 
praise group in the second work had 
a mean of 3.16. The stress group of 
the second study (apparently the 
same group listed under “severe 
stress” in the earlier report) had a 
mean of 5.12. Evidently, Cowen’s 
conclusion is not borne out by the 
data which show clearly that both 
stress and praise succeeded in in- 
creasing the proportion of long solu- 
tions. The same inference may be 
drawn from the data on time of re- 
sponse. The mean time for all prob- 
lems for the praise group is 33.60 sec- 
onds. For the neutral control, the 
mean is 21.28 seconds, and only 30.36 
seconds for the mild stress group! 
The mean time of solution for the Ex 
is 75.20 seconds for the praised Ss, 
24.72 seconds for the neutral control, 
and only 62.64 seconds for the mild 
stress group. An interpretation of 
this analysis is not immediately ap- 
parent. Perhaps praising an S for 
performance on a projective test and 
expressing interest in his further per- 
formance for correlational purposes 
(Cowen’s “praise” technique) actu- 
ally places the S in a stressful situa- 
tion. ; 
The study of Sivers (51) furnishes 
an interesting comparison with the 
five experiments discussed thus far 
in this section. Sivers approached 
the question from another angle. He 
distinguished a rigid and a nonrigid 
group on the basis of WJT scores and 
then selected a subsample of 44 Ss 
from each group, matched on the 
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basis of scores on Form A of the Ab- 
stract Reasoning Test. Half of the 
Ss in each group were subjected to 
stress, after which all Ss took Form 
B of the ART. An analysis of vari- 
ance of the difference scores between 
the two forms showed a highly sig- 
nificant variance due to stress, but 
an insignificant variance for rigidity, 
and no interaction. In other words, 
stress interfered with performance on 
Form B, but the effects were uncom- 
plicated by rigidity. The abstrac- 
tion ability of the rigid Ss was no 
more impaired by stress than that of 
the nonrigid Ss. The implications of 
the Sivers study relative to the find- 
ings of Christie, Harris, Pally, and 
Cowen, will be discussed in the next 
section. 

The investigations of Brown (8), 
Applezweig (4) and French (20), 
though primarily correlational stud- 
ies, also — comparisons of de- 
scriptive data under stress and non- 
stress conditions, Differences be- 


had the largest. The critical ratio of 
this difference is given as 2.04 with a 
p beyond the .05 level, 

Applezweig also reports a signifi- 
cant variance ratio for the scores of 
the two groups. As in the case of 
Christie’s data, the ordinary table of 
probability cannot be used. The 
Cochran-Cox adjustment raises the 
CR required for the .05 level of sig- 


nificance to 2.06, so that Applezweig’s 
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CR actually falls short. Furthermore, 
as is often the case with WJT data, 
Applezweig’s distributions are mark- 
edly skewed (Cf. 3). If we apply a 
median test to the data of the un- 
stressed and week-after-stress groups, 
a chi square of only 0.84, p=.26, is 
obtained. 

If the results of the Brown, French, 
and Applezweig studies are averaged, 
the over-all mean score for the stress 
group is 2.75 short solutions, and 3.00 
short solutions for the nonstress 
group. This difference is not likely 
to be significant. Certainly it is far 
smaller than those to be found in the 
Cowen studies. We may reasonably 
conclude that the descriptive data 
of the correlational studies do not 
seem to support the conclusion that 
stress is accompanied by an increase 
in long WJT solutions. 


STRESS AND THE VALIDITY OF THE 
WJT As A MEASURE 
OF RIGIDITY 

Some of the experimenters who 
have used the WJT in stress studies 
carefully phrase their results in terms 
of “problem-solving rigidity” or some 
similar expression. The inferences 
whether expressed more or less overtly 
or allowed to remain implicit, is that 
rigidity is a function of the situation 
rather than of the personality. None- 
theless, in discussion sections subse- 
quent to experimental results, these 
Same experimenters will make ge” 
eralizations from situational to pet 
sonality rigidity. Problem-solving 
rigidity is viewed as a “paradigm © 
maladaptive behavior,” (13, p. 518) 
Or it is regarded as “the same as that 
observed clinically and reported 1” 
Studies dealing with a variety © 
» Pathological states” (46, p. 352). 
cause of such statements, the stl 
dies of the WJT under stress wer 
considered by the present writer t° 
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be actual efforts to demonstrate the 
validity of the WJT as a rigidity 
measure. 

In summing up the various studies 

of performance on the WJT under 
stressful conditions (4, 8, 10, 12, 13, 
20, 24, 46, 51), we could hardly say 
more than that the over-all picture 
is beclouded. The evidence certainly 
will Not support the unqualified con- 
clusion that there is a greater degree 
of WJT “rigidity” manifested under 
stress than under nonstress circum- 
stances. But let us assume, for pur- 
Poses of discussion, that this conclu- 
sion is warranted. How does it bear 
on the validity of the WJT as a meas- 
ure of rigidity? 
_ To begin with, the WJT is a learn- 
ing paradigm, similar in many re- 
spects to other tasks used in learning 
experiments. Its singular character- 
istic is that one of the two competing 
responses is made dominant, but the 
weaker response is the ‘‘correct’’ one. 
Studies of the effects of stress on this 
type of learning have been carried out 
by Castaneda and Palermo (9), Far- 
ber and Spence (18), and Montague 
(44) among others. The findings are 
ee in the following quota- 
ion: 


If the habit strength of the correct response 
should be relatively weak, an increase in drive 
should further increase the strength of the in- 
Correct tendencies relative to the correct 
tendency, resulting in impaired performance. 
Furthermore, the degree of impairment should 
be a positive function of the number and 
Strength of the competing incorrect response 
tendencies (18, p. 120). 


_ A translation of the Hullian dialect 
into water-jar terms leads to this 
statement: “The administration of 
set problems prior to the Cr’s or Ex’s 
makes the long (incorrect) solution 
the dominant one. When the S is 
Placed in stress, the frequency of 
dominant responses to the Cr’s or 


Ex’s increases. Furthermore, it in- 
creases as a function of the number 
of set problems which were admin- 
istered (i.e. as a function of the 
strength of the incorrect tendency).” 

The hypothesis that stress is ac- 
companied by an increase in long 
solutions is thus one which comes 
out of learning theory, and has been 
demonstrated by learning experi- 
ments. It fits the data of tasks like 
learning paired associates, discrimi- 
nating colored lights, or pulling levers 
as well as it does the results with the 
WJT. The findings of Cowen, 
Christie, etc., thus have no particu- 
lar bearing on the validity of the 
WJT as a rigidity measure unless one 
is willing to accept any simple learning 
task of a certain type as a rigidity 
measure. This inclusion would surely 
be unacceptable to those who regard 
the WJT as a personality index. 

Sivers (51) provides admirable 
support for this stand. showed 
that rigid and nonrigid Ss on the 
WJT manifest similar impairment in 
performance on another task when 
stress is introduced: If the WJT were 
measuring rigidity, we would expect 
that the “rigid?” Ss would be more 
affected by stress. In other words, 
while the WJT functions adequately 
as a learning task, it fails as a diag- 
nostic instrument. 


Tur WJT As A PSYCHOMETRIC 
INSTRUMENT 


Despite the widespread use of the 
WJT by psychologists, especially in 
doctoral dissertations, there has been 
only casual concern with its defects 
as a psychometric tool. There ap- 
pear to be three such defects, any 
one of which would be likely to be 
regarded as serious by formal test 
constructors. Two of these—loss of 
subjects due to criteria for accepting 
an experimental protocol, and the 
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skewness of distributions of scores— 
were pointed out several years ago 
by Levitt and Zelen (31). The third, 
unamenability of the test to esti- 
mates of reliability, has been practi- 
cally ignored by experimenters. Each 
of these points warrants some ex- 
tended discussion. 

Reliability. Sivers (51) is one of 
the few experimenters who has been 
concerned with the reliability of the 
WJT. He sums up the matter con- 
cisely, thus: 


The reliability of the water jar test as a 
measuring instrument is difficult to establish 
directly. Most of the commonly used tech- 
niques donot suffice, for in the course of taking 
the test problem series many subjects discover 
that they have not always availed themselves 
of the direct method. Once a subject is con- 
sciously aware of what he might have done on 
previous problems, and if he has used the 
indirect method when the direct method could 
have been employed, he comes alert to further 
possibilities of this kind. For this reason, a 
test-retest situation is inappropriate. A split- 
half technique is obviously not to be con- 
sidered inasmuch as test items cannot be 
equated (51, pp. 52-53), 


In short, Performance on a subse- 
quent test will be likely to be af- 
fected by performance on the original 
test for many Ss, Test-retest and 


equivalent forms are thus ou 
f t 
question, oye 


» actually attem 
construct eq pted to 


correl 
However, Bakan’ ba 
cal type, not used by any ville. 
perimenter, and it is not certain that 
the reliability which she obtained can 
be generalized to include other forms 

The assumptions necessary for the 
computation of statistical estimates 
of reliability are 


r obviously not satis- 
fied by a test in which the perform- 
ance on any one item is apt to be af- 


° The forms are not literall 


a ra- y equivalent 
since the means differ significant! 


ly. 
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fected by performance on previous 
items. In fact, there does not seem 
to be any sound way in which the re- 
liability of the WJT could be esti- 
mated. The conscientious experi- 
menter who makes use of the WJT 
must find some method of rationaliz- 
ing away his inability to estimate its 
reliability. 

Loss of subjects. One of the unusual 
aspects of the WJT as a psychometric 
tool is that its use seems invariably to 
lead to a greater loss of Ss from the 
experimental sample than is custom- 
ary in psychological research. The 
loss is a result of various standards 
of performance required by the ex- 
perimenter on the preliminary prob- 
lems in the test. The S who fails to 
perform in the requisite manner 1$ 
eliminated from the final, crucial 
phase of the testing. Unfortunately, 
almost half of the studies do not re- 
Port either the standards or the sub- 
ject loss, although it is probable that 
the standards were applied in most, 
or all, cases. A few studies note the 
standards, but not the loss. In sev- 
eral instances, multiple standards 
were used, and only a pooled loss 8 
recorded. Of 34 studies of adult sam- 
ples, only 16 report both standards 
and loss, while 15 report neither. 

ne criterion is a sine qua non 
its most literal sense) of any W. 
investigation—the solution of a red 
uisite number of set problems by 
long method. It is an obvious an4 
well-demonstrated fact that per 
formance on subsequent problems 
will be a function of the number ° 
set problems solved by the long 
method. Hence the experimente" 
must necessarily see to it that all Ss 
advancing to the crucial stage have 
solved the same, or approximately 
the same number of sets by the long 
method. Despite the evident 1 
Portance of this standard, only 
Studies report its application, 2” 


(in 
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two of these do not give the conse- 
quent subject loss. In two of the re- 
maining 10, a pooled loss due to mul- 
tiple criteria is given. 

In the remaining eight studies, 221 
of 885 Ss, a total of 24.97 per cent, 
have been lost due to this criterion. 
This amounts to 9.27 per cent of the 
total sample of 2,385 Ss in the studies 
reporting both standards and loss. 
These data, as well as the losses due 
i other criteria, are shown in Table 

In the Rokeach and Cowen forms 
of the WJT, a short solution (or a 
long solution) of a control problem 
is also used as a standard. In the 
seven studies in which the loss due 
to this criterion can be assessed, 181 
of 886 Ss had to be discarded for fail- 
ure on the control problem. This 
amounts to 20.43 per cent, or 7.59 
per cent of the Ss in all studies giving 
both criteria and loss. 

Occasionally a criterion of ‘“‘arith- 
metic accuracy” is applied. It is 
most often not clear just what the 
experimenter means by this expres- 
sion; in some instances, the loss may 
be due to simple inability to add and 
subtract. In others, this standard is 
probably the same as the require- 
ment of long solution of the sets 
(naturally, the long solution cannot 
be properly used unless the arith- 
metic computations are accurate). 
One-hundred and nineteen Ss or 
22.54 per cent of 528 Ss were lost in 
two studies due to this criterion. This 
is 4.99 per cent of the over-all sample. 

In four studies where the loss due 
A multiple criteria is given as one 

gure, or where there is an unex- 
plained loss, a total of 113 Ss of 455 
—24.84 per cent—were lost.!° This 


weet indeterminate number of these 113 Ss 
ame lost due to absences from testing ses- 
the por failure to volunteer to continue with 
oi S periment. These losses, of course, can- 

e attributed to the properties of the 


amounts to 4.74 per cent of the whole 
sample. 

Over all, 634 Ss, or 26.58 per cent 
of the 2,385 Ss who were originally 
tested were eliminated from the final 
phase of the experiments. The per- 
centage tends to be much higher 
when younger Ss are tested. One 
hundred and sixty-three of 286 child 
Ss in the studies of Rokeach (47) and 
Cowen and Thompson (15) had to 
be eliminated, a loss of 57 per cent of 
the sample. 

Nor can the loss be attributed to 
group administration of the test. In 
10 studies using such administration, 
25.39 per cent of the Ss were lost, 
while in four reports of individual 
test administration, 32.57 per cent 
were lost. The loss was 22.66 per 
cent in two studies which did not 
note the type of administration. 

The over-all loss of over 25 per cent 
in the adult studies is a sizable attri- 
tion, and might very well result in a 
sampling bias. And losing one out 
of every four Ss halfway through an 
experiment is hardly economical of 
time and research populations. 

The distribution of WJT scores. A 
test constructor ordinarily strives to 
develop an instrument which will 
provide a normal, or nearly normal 
distribution of scores. His aim may 
be linked to theoretical considera- 
tions, but more importantly, a nor- 
mal distribution enables the experi- 
menter to apply parametric statistics 
—the most powerful available—in 
analyzing results. Regardless of the 
test, the individual experimenter 
rarely obtains true normality since 
most samples are relatively small. 
However, if the curve of the distribu- 
tion is symmetrical about a maxi- 


the number of Ss 
d a correction for 
er-all loss only 


WJT. Inall probability, 
thus lost is quite small, an 
the loss would change the ov 


slightly. 
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mum ordinate at the mean, or even 
if it is asymmetrical but not mark- 
edly skewed, the experimenter will 
usually assume normality in the par- 
ent population so that he can resort 
to a parametric analysis. (The exact 
nature of the population distribution 
is seldom known.) But when the ob- 
tained distribution is multimodal, 
J shaped, or plainly skewed in some 
fashion, the assumption of underly- 
ing normality becomes untenable, 


owen, w 


maen 
PORSIBLE WATER-ZAR TEST scoaz 

Fic. 1. DISTRIBUTIONS OF 
Scores SHowinc PERCENTAGES OF Ss Opg- 
anne THE HIGHEST, LOWEST, AND MIDDLE 
Fa ORE OF THE RANGE OF PossIBLE Scores, 

HE Cr-CrEx Curve Is BAsep on 442 Ss 
IN Six STUDIES, THE Ex Curve Is BASED on 
166 Ss In Two Stupigs, 


WaATER-JAR TEST 


rge num- 
» They suggested 
olved were prob- 
ably badly skewed, In the studies in- 
ent review, six 
others (8, 16, 20, 21, 24, 30) speci- 
fically mention obtaining skewed dis- 
tributions. Brown (8) and French 
(20) note that variance data are not 


presented becauseofskewness. Maltz- 
man et al. (42) and Applezweig (4) 
appear to be cognizant of maldistri- 
bution in their data by the use of tau, 
a nonparametric correlation coeffi- 
cient. Harris (24) reported that he 
converted his time measures into 
logarithms since the distribution of 
raw data was skewed. 

Nine studies (2, 3, 10, 24, 25, 27, 
31, 32, 39) either present the distribu- 
tions of scores, or give sufficient in- 
formation from which the shape of 
the distribution may be inferred. 
these, the distributions of time meas- 
ures in the works of Harris (24) and 
Christie (10) are both clearly non- 
normal. The range of possible scores 
in the studies using Cr, CrEx or EX 
measures varies from 0-4 to 0-7, 50 
that direct comparability is not sim- 
ply accomplished. A reasonably clear 
Picture of the nature of combine 
distributions may be obtained by 
Plotting the Percentages of Ss at- 
taining the highest possible score, the 
lowest possible, and the score at the 
midpoint of the range. Figure 
shows such a composite curve for the 
Cr and CrEx data from six researches 
(2, 3, 25, 27, 31, 39), and a similar 
curve of Ex scores from two studies 
(32, 39) U1 
In the former group, 31.8 per cent 
of 442 Ss received the lowest possible 
Score, which is zero in all instances 
An additional 20.6 per cent attaine 
the highest Possible score, while only 

Per cent fall at the midpoints. More 
than 50 per cent of all the Ss mant 
fest “all-or-nothing” rigidity. bia 

* curve is similar with 31.3 per ce” 
of 166 Ss having a zero score, 


1 In some instances, the experimenter fe 
Ported the combined frequencies for the tY 
extreme scores at either end of the range, He 
did not separate them. The present ie 
divided the frequency evenly between the tY 
Spores in those cases, In view of the ove 
data, this Procedure probably tends to ™ 


mize the frequency of Ss at the extremes: 


al 


\ 


WATER-JAR EINSTELLUNG TEST AND RIGIDITY 


per cent attaining the highest possi- 
ble score, and 13.9 per cent at the 
midpoints. Here, more than two- 
thirds of the Ss fall at the extremes. 
The difference between the percent- 
ages falling at the extremes in the 
Cr-CrEx studies and the Ex studies 
is probably a function of the more 
limited ranges of possible scores in 
the latter group. At best, we must 
conclude that fully half of all scores 
in WJT studies are likely to be found 
at the ends of the distribution of pos- 
sible scores. Distributions will thus 
tend to be markedly U shaped, defi- 
nitely nonnormal. 

To summarize this section, the 
available evidence indicates clearly 
that the WJT is deficient as a psy- 
chometric tool in three important re- 
Spects: (a) no reliability coefficient 
can be estimated for it, (b) about 25 
Per cent of Ss originally sampled 
must be discarded along the way, and 
(c) it leads to skewed distributions of 
Scores, precluding the use of para- 
metric statistical analyses. 


METHODOLOGICAL SHORTCOMINGS 
OF THE WJT STUDIES 


_ The WJT seems to be so poor an 
instrument that an incautious ex- 
Perimenter will be easily led to com- 
mit errors of design and analysis. 
Researches with the WJT are rife 
with such flaws, some of which have 
been mentioned in previous sections 
of this paper. The most common of 
these is the use of parametric analy- 
Ses, which are usually inappropriate, 
as the discussion of the last section 
shows. Statistics like the £ test, r, and 
biserial y were used in 23 studies.” 
Other errors include the use of Ss 
who failed to solve the specified num- 


12 Use of the F ratio with nonnormal distri- 
butions is not always inappropriate in view of 
evidence (33) that the distribution of F is in- 
Sensitive to the shape of the parent distribu- 
tions of variates involved. 
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ber of set problems in the crucial 
phase, and, failure to increase small 
N's to compensate for losses when 
individual test administration was 
used. 

There were a number of other 
shortcomings which are not directly 
attributable to the test itself. Among 
these were the use of chi square with 
nonindependent frequencies or with 
very small theoretical frequencies, 
failure to correct small chi-square 
frequencies for continuity, inappro- 
priate use of one-tailed tests of sig- 
nificance, failure to use the over-all 
F test when more than two groups 
were involved, incorrectly stated 
probability levels, failure to adjust 
the probability level when variances 
were heterogenous, and inadequate 
explanations of the arrangement of 
data for statistical analysis. 

That the evidence reviewed here 
fails to demonstrate the validity of 
the WJT as a rigidity measure ap- 
pears to be an unchallengeable con- 
clusion. However, many of the studies 
are methodologically poor, so that it 
is possible to argue that the WJT has 
not yet been subjected to sound in- 
vestigation, and that any conclusion 
should hence be held in abeyance. 
The adoption or rejection of this 
stand is left to the reader’s discre- 


tion. 


SUMMARY AND CONCLUSIONS 


Thirty-one correlational studies 
involving the water-jar Einstellung 
test and criterion measures were re- 


viewed. Although there are various | 


forms of the test, and various meas- 
ures derived from it, no one was pre- 
dictively superior to the others. Stu- 
dies using the extinction problem as 
a measure of rigidity obtained no 
better results than those using the 
critical problem, or combinations of 
problems. Only five studies of the 
31 report positive results. About 75 
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per cent of over 200 individual cor- 
relations are not significant; the 
average of 18 correlations between 
the WJT and the California E and p 
scales is .07. Brown’s hypothesis 
that the relationship between au- 
thoritarianism and rigidity is a func- 
tion of stress or ego-involving condi- 
tions is not borne out by the data. 
The average of seven correlations 
computed from data obtained under 
stress is only .05. 

Analysis of the relationships be- 
tween the WJT and the Rorschach, 
measures of emotional adjustment, 
concept formation, reasoning, and 
perceptual and motor tasks indicates 
that no individual index has a clear 
or consistent association with the 
WJT. An analysis of nine studies of 
intelligence and the WJT lead to the 
tentative conclusion that there is a 
consistent, low negative relationship 
between the WJT and intelligence, 
This conclusion is supported by two 
factor-analysis studies, both of which 
place the WJT in factors heavily 
loaded with intelligence, though not 
in factors termed rigidity. 

Five noncorrelational studies of the 
WJT in experimental stress condi- 
tions are reviewed and criticized. The 
results of at least three must be re- 
garded as ambiguous, while those of 
Cowen (12, 13) indicate that both 
stress and praise may increase “rj- 
gidity” on the WJT. The correla- 
tional studies of the WJT and stress 
in which comparative descriptive 
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data are presented, do not support 
the conclusion that WJT “rigidity 
increases under stress. The study of 
Sivers (51) suggests that though tet 
performance may be impaired by 
stress, the impairment is Lanois 
to scores on the WJT. A theoretica 
explanation of the effect of stress on 
WJT scores is offered. This explana- 
tion derives from learning experi- 
ments and learning theory, and bs 
tempts to show that increases in wij 
“rigidity” under stress have no bear- 
ing on the validity of the WJT as 4 
rigidity index. 

“Three deficiencies of the WJT as & 
psychometric instrument are dis- 
cussed. It is concluded that, (a) 2 
reliability coefficient cannot be pap 
mated for the test, (b) about one i 
every four Ss in an original sample 
will be eliminated from the crucia 
experimental phases due to ne 
standards of performance required T 
preliminary stages, and (c) the W J 
tends to produce nonnormal, usually 
U shaped distributions of scores. ; 
number of methodological defects in 
the studies reviewed were also pointe 
out. 

The conclusions of this review can 
be summarized pithily in two state 
ments: m 

1. After eight years of research: 
evidence for the validity of the water” 
jar test as a measure of validity iS 
still lacking, = 

2. The water-jar test is a poor PSY 
chological test qua test. 
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THE NEURAL QUANTUM THEORY OF 
SENSORY DISCRIMINATION! 


JOHN F. CORSO 
Pennsylvania State University 


From the time of classical psycho- 
physics, the phi-gamma_ hypothesis 
has been widely accepted (7, 13, 15, 
18, 22, 39, 41). The hypothesis 
States that, in a psychophysical ex- 
periment, the relationship of the pro- 
Portion of observer responses to stim- 
ulus values is accurately described by 
the integral of the normal probability 
curve (41). Recently, however, this 
hypothesis has been directly chal- 
lenged by the neural quantum? theory 
of sensory discrimination (37, 38, 45). 

Broadly stated, the theory of the 
neural quantum may be considered as 
an attempt to explain a paradox in 
modern sensory psychology. How 
can a continuous change in environ- 
mental energy give rise to a (seem- 
ingly) continuous change in sensory 
experience, when it is generally 
agreed that sensory mechanisms are 

“composed of discrete neural elements 
which follow the all-or-none law of 
Physiology? The paradox may be re- 
solved in one of two ways: experi- 
mental evidence must be obtained 
which demonstrates either that (a) 
Sensory nerve action is continuous 


1 This work was supported by a research 
aoe NSF-G1285 from the National Science 

oundation, 

*The term “quantum,” as used in the 
eaat paper, has a meaning entirely dif- 
erent from Planck’s (20) quantum in physical 
theory, Hecht, Shlaer, and Pirenne’s (23) 
quantum in visual theory, and Gabor’s (19) 
quantum in auditory theory. In each of these 
Instances, the quantum refers to a unit of 
Physical energy; here it refers to a functionally 
distinct unit in the neural mechanisms which 
mediate sensory experience. Hence, ‘quan- 
tum” in the present sense implies a perceptual, 
rather than physical, unit. 


or that (b) the (apparent) contin- 
uum of sensory experience is discrete. 
The classical phi-gamma hypothesis 
assumes the first alternative, since 
psychometric functions are typically 
found to be smoothly sigmoidal in 
form. The more recent quantal hy- 
pothesis assumes the second alterna- 
tive, since some psychometric func- 
tions have been obtained which are 
linear in form. In general terms, the 
latter findings have been interpreted 
as an indication that the change in 
nervous activity which leads to a 
discriminatory response proceeds in 
a stepwise manner by definite incre- 
ments or quanta and that these 
quanta are directly reflected in the re- 
sponse itself. 

Since the question of the best 
mathematical formula for represent- 
ing a psychometric function, together 
with its underlying implications, is of 
central importance in the area of psy- 
chophysics, it would appear that 
some detailed attention should be 
directed toward the newer theoretical 
developments. The purpose of the 
present paper, therefore, is to present 
a complete account of the theory of 
the neural quantum of sensory dis- 
crimination and to re-examine the 
theory in the light of experimental ev- 
idence accumulated since its incep- 


tion. 


EARLY NOTIONS OF 
SENSORY QUANTA 
In 1919, Titchener (42), in discuss- 
ing the problems of measuring the 
stimulus and differential limens, 
stated that once the “nervous ma- 


r 
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chine” was started and adequate 
stimulation was continued, sensation 
would follow continuously the changes 
of stimulus. While not expressed in 
quantal terms, the basic notion un- 
derlying the discontinuities observed 
at the stimulus and differential limens 
was that every sense organ offered a 
certain amount of ‘‘frictional resist- 
ance” to the stimulus. Supposedly, 
this resistance had to be overcome be- 
fore a corresponding change in the 
sensation would result.’ 

Boring (8) in 1926 attacked the 
problem of sensory experience more 
directly and pointed out that any 
theory based upon specific energies 
of nerves is a theory of sensory 
quanta. In hearing, according to this 
view, a new pitch is produced when 
the sound stimulus activates a dif- 
ferent neural element. Furthermore, 
the sensory continuum reduces to a 
finite number of small steps, 
quanta, 
ber of d 
in the gi 
sentially, 


or 
corresponding to the num- 


iscrete responsive elements 
ven sense organ. This, es- 
was the problem Helmholtz 
(24) tried to solve years earlier when 
he compared the number of discrim- 
inable pitches with the number of 
rods in the organ of Corti. 


In the absence of experiment 
data demonstrating the existence of 
pitch quanta, Boring (8) rejected the 
resonance theory of pitch 
ported, instead, a frequen 
in which pitch depended up 
quency of neural impulses 


al 


and sup- 
cy theory 
on the fre- 
and was, 


3A somewhat similar view has been ex- 
pressed more recently by Licklider (30 p. 
1001) who contends that “in the simplest 
conceptual neurology, the stimulus threshold 
owes its existence to the effect of a small 
barrier . . . between successive stages in the 
neural processes that underlie hearing. .., If 
the DF (Difference Limen) is more than a 
statistical artifact, the neural mechanism 


must function in a stepwise or quantal 
manner. 


therefore, nonquantal. Loudness, 
however, was related to the number 
of nerve fibers activated by the stim- 
ulus, thereby making it quantal. 

While not considering seria 
quanta directly, Troland (43) pointe 
out the tendency toward ‘‘quantum 
theorizing” about the processes of a 
nervous system. It was suggeste' 
that “the all-or-none principle, as ap- 
plied to nerve activity, forces us “ 
think of the latter in terms of fixe 
units of influence” (43, p. 37). 

In 1930, Békésy (1) presented the 
first experimental evidence which 1n- 
dicated that, with appropriate tech, 
niques, discrete sensory steps coule 
be obtained, at least in the field 0 
hearing. This was accomplished by 
Presenting a standard tone of 0.3 sec. 
duration, followed immediately by ê 
comparison tone of the same duration 
but of variable intensity. The obr 
server reported whether or not Ne 
heard a difference between the tw° 
tones. The data of this study, ah wa 
as percentage judged different agains 
AI/I, yielded rectilinear functione 
which were interpreted as indicatio” 
of the quantal nature of differenti 
sensitivity to intensity. Apparent y 
Békésy (1) was able to minimize 5U17 
ficiently the “extrinsic variability h 
in the experimental situation SUC 
that the “true” mechanism of se" 
sory discrimination was finally ™ 
vealed. 

In 1936, Békésy (2) obtained 
ther evidence on the quantal natur 
of sensory functions. In this case, t 5 
minimum audible pressures for PY“ 
tones were determined from about 


t “Extrinsic variability” refers to the an 
ability in factors outside the specific part a 
the sensory nervous system critically involv 
in making the required discriminations on of 
given experiment, e.g., changes in criteria 
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cycles per second (cps) to 50 cps by 
_alternately increasing frequency and 
decreasing intensity. When the 
audibility curve was plotted, steplike 
discontinuities occurred at fairly reg- 
ular intervals between 4 cps and 50 
cps, with the most prominent step at 
18 cps. 


THE THEORY OF THE 
NEURAL QUANTUM 


. The theory of the neural quantum 
in audition was made explicit by 
Stevens, Morgan, and Volkmann 
(38) and is derived from the assump- 
tion that the basic neural processes 
which mediate pitch and loudness dis- 
crimination operate on an all-or-none 
principle. These processes are as- 
sumed to involve neural structures 
which are divided into functionally 
distinct units or quanta. A further 
assumption of the theory states that 
a stimulus-increment will be discrim- 
inated whenever it excites one quan- 
tum more than the number of quanta 
excited by the standard stimulus at a 
given moment.® 

On the basis of the assumption of 
the existence of neural quanta, sen- 
sory discrimination data would be 
expected to yield a rectilinear psy- 
chometric function. This would be 
accomplished theoretically in the fol- 
lowing manner. Suppose that a cer- 
tain stimulus excites completely a 
given number of quanta and that no 
stimulus energy, or residual, exists 
after this neural excitation has been 
accomplished. Then let stimulus- 
increments be added to the predeter- 


-A When this assumption is met, the observer 
1s said to have adopted a “one-quantum” cri- 
terion of discrimination. Usually, however, 
for reasons to be indicated in the development 
of the theory, such a criterion is difficult to 
establish and the observer will require that the 
stimulus-increment excite two additional 
quanta before a discrimination is reported. 


mined stimulus under specified ex- 
perimental conditions of pitch or 
loudness discrimination. If the 
neural units are stable and constant. 
the increments will not excite the ad- 
ditional quantum required for dis- 
crimination until their magnitude 
reaches a certain size. Thereafter, 
each time that the increment is added 
to the standard stimulus, a just no- 
ticeable difference (j.n.d.) in pitch or 
loudness should occur. If the data of 
this theoretical model were presented 
in the form of a psychometric func- 
tion, as shown in Fig. 1, with percent- 
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age of response plotted against in- 
cremental magnitude, 0 per cent re- 
sponse would be obtained up to a 
certain point on the stimulus scale 
and 100 per cent response would be 
obtained for all increments above 
this point. These expectancies, how- 
ever, are not evidenced in auditory dis- 
crimination data and, consequently, 
an additional assumption on thresh- 
old variability must be introduced. 

This assumption holds that the 
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over-all sensitivity of the human or- 
angism does not remain at a constant 
level, but fluctuates momentarily and 
randomly® through magnitudes con- 
dierably larger than a single quan- 
tum. It follows, therefore, that the 
amount of stimulus energy required 
to activate a fixed number of neural 
units will vary with fluctuations in 
sensitivity. Conversely, a stimulus 
of given magnitude will excite a vary- 
ing number of neural units. Since the 
variation in the number of activated 
units is assumed to be quantal, or 
stepwise discontinuous, all of the 
available energy of a given stimulus 
will not necessarily be utilized in a 
given presentation. Thus, at a par- 
ticular moment, the given stimulus 
may excite completely a certain num- 
ber of quanta and leave a small 
amount of residual energy which 
“partially” excites an additional 
quantum.’ This residual, while in- 
effective by itself to activate the 
next quantum, becomes available for 
summation with the energy provided 
in the succeeding stimulus-increment 
and may consequently produce a dis- 
criminatory response. 
ae notions of the theory of 
t quantum which have been 
introduced up to this point are sche- 


ë This assumption appears i 

with the available antes of Montyomary (33) 
and Lifschitz (31) which indicate that the 
sensitivity of the ear approximates a normal 
distribution as it varies with time, These 
fluctuations are presumably due to extraneous 
factors, such as breathing movements extra- 
loud heart beats, lapses of attention, shifts in 
motivation, etc. 


7 This notion of “partial” excitation follow: 
the work of Stevens, Morgan, and Valle cn 
(38). While such a notion is inconsistent with 
a quantal function, it does aid in the concep- 
tualization of the theoretical model and 
hence, will be retained in the present paper. 
A restatement of the concept in more precise 
stimulus terms would in no substantial way 
alter the theory. 
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matically represented in Fig. 2. Two 
continua are shown: (a) a stimulus 
continuum with an arbitrary scale, 
and (b) a sensory continuum with dis- 
crete neural units. On the stimulus 
continuum, S is the magnitude of the 
standard stimulus; ASg is the mag- 
nitude of the stimulus-increment 
which will always excite an additional 
quantum; and AS is the amount of 
energy (magnitude of the stimulus- 
increment) which is required to aC- 
tivate a “partially” excited neural 
unit. On the sensory continuum, p 
is the amount of “partial” excitation 
resulting from the presentation ofa 
given S. 

If Fig. 2 is taken to represent the 
condition of loudness discrimination, 
a stimulus magnitude of 17 energy 
units is considered sufficient to sti™- 
ulate completely the neural elements: 
a, b, and c Neural element “d” 15 
only “partially” stimulated by the 
residual energy beyond 15 units. As- 
sume that such a situation wou 
yield a given loudness. If the stimu 
lus energy were reduced to 15 units, 
there would be no apparent change 
in loudness; but, if the energy wer 
reduced to 14 units, neural element 
c” would drop out and the loudness 
would diminish by one j.n.d. Like 
wise, if the energy were increased te. 
19 units by introducing a 2-unit stim” 
ulus-increment, no change in louc- 
ness would result. At 20 units, how” 
ever, the loudness would increase 
one j.n.d. ; 

Two features of the diagram in Fig: 
2 should be noted specifically: (4) the 
size of the neural quantum is meas” 
ured in terms of ASq, and (b) a upar- 
tially” excited unit can be stimulaté 
by adding to S an increment (AS) 
smaller than the amount (AS4) © 
quired for stimulation when no “P@*” 
tial” excitation (p) exists. Fluctu®” 
tions in sensitivity can, therefore 
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gas about discriminatory responses 
rk ate nl smaller than one of 
for unit size. This would account 
et absence of the perpendicular 
Bredi ometric function (see Fig. 1) 
the icted on the single assumption of 
oae of neural units. 
aay indicated, a stimulus- 
ie ioe (AS) smaller than one of 
additi unit size (ASq) will excite an 
Pa R quantum only when the 
meik d energy (p) is sufficiently aug- 
a ital by the increment to provide 
pie 7 supply of energy equal to or 
ater than that required for the ac- 
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SENSORY DISCRIMINATION 


NEURAL 
UNITS 


NEU! UNITS 
COMPLETELY 
EXCITED BY S 


HE Basic NOTIONS INVOLVED 


tivation of a “complete” neural unit. 
Obviously, when the residual is large, 
the increment required is small; when 
the residual is small, the increment 
required is large. Thus, at any in- 
stant, the magnitude of the stimulus- 
increment necessary to add another 
quantum to the total number excited 
by the standard stimulus depends 
upon the amount of residual energy 


or “partial” excitation. Stated in 
mathematical form: 
AS=ASq—hr i] 


where AS is the stimulus-increment 
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required to activate an additional 
quantum; AS¢q is the size of the incre- 
ment which will always excite one 
quantum; and p is the amount of 
“partial” excitation elicited by the 
surplus energy in the standard stimu- 
ulus (S). 

Equation 1 indicates that a given 
AS will completely stimulate the ad- 
ditional quantum needed for a dis- 
crimination whenever ASZASq—p. 
As the size of the increment becomes 
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VALUES ARISING FROM RANDOM FLuctua- 
TIONS IN Orcantsmic S 


greater, an increase in the 
discriminations is 


€ precise mann 
number of digs 


quencies of occur- 
rence can be arrived at by the fol 


lowing logical analysis, Assume 

before, that the over-all fluctuation 
in the sensitivity of the organism is 
large as compared to the size of a 
neural quantum. This fluctuation 
will produce a Variation in the num- 
ber of neural units excited completely 
by the standard stimulus, Since the 
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amount of surplus or residual energy 
cannot exceed neural unit size, it 
must always stimulate “partially”. 
the first unit beyond the last one 
stimulated by the standard stimulus. 
The surpluses will be spread out, 
therefore, over the same range of 
neural units stimulated during the 
course of fluctuations in sensitivity. 
The relative frequency with which 
the surpluses are distributed over this 
range will be dependent upon the 
time distribution of the organism's 
sensitivity. Assuming, as before, 
that the organism fluctuates in sensi- 
tivity because of a large number of 
unknown, independent factors, the 
distribution of surpluses over the 
range described will approximate a 
normal curve. 

This “chance” distribution of sur- 
Pluses is shown in Fig. 3. The ab- 
scissa has been arbitrarily divided 
into six equal neural units which rep- 
resent the range over which the: or- 
ganism fluctuates in sensitivity. 
Each neural unit has been sub- 
divided, also arbitrarily, into ten 
equal surplus values. Thus, a z5 
plus of a given magnitude may De 
found to occur in each of the six 
neural units. The ordinate ol P 
distribution represents the Tenai 
cal relative frequency of ageuman 
of the surplus values. For ge a 
let the surplus value equal 0.3 0 se 
neural unit. The vertical lines draw e 
in the distribution will indicate n 
relative frequency of occurrence ke 
this surplus value within each of t j 
six neural units. Notice that this af 
ative frequency is not the same fro 
unit to unit. the 

he Probability function of Te 
surplus values can now be de 
mined. This is accomplished by pana 
mating over the several neural unt 
covered by the normal distribution e 
surpluses resulting from the orga! 


| 


NEURAL QUANTUM THEORY OF SENSORY DISCRIMINATION 


ism’s fluctuations in sensitivity, the 
relative frequencies of occurrence for 
gach possible surplus value from zero 
to neural unit size. From the ob- 
tained distribution, the probability 
of a given surplus value may be de- 
termined. 

A graphical derivation of the prob- 
ability function of surplus values can 
be demonstrated by utilizing the 
representation in Fig. 3. Accordingly, 
Fig. 4 shows the function obtained by 
summating, over the six neural units, 
the relative frequencies of occurrence 
of the individual surplus values rang- 
ing from zero to one in 0.1 neural unit 
steps. For example, the segmented 
vertical line in the diagram of Fig. 4 
represents the summation for the 
surplus value equal to 0.3 of a neural 
unit. Each segment of the line, start- 
ing at the bottom, corresponds in 
length to the appropriate ordinate 
shown in Fig. 3. The same procedure 
has been followed to obtain the sum- 
mation value for each of the nine re- 
maining surpluses. Since the form of 
the obtained distribution is approxi- 
mately rectangular, and the neural 
unit is divided into ten equal parts, 
the probability for each surplus is 
the same. Thus, it may be said that, 
Sliven a standard stimulus, any sur- 
Plus value is as likely to occur as any 
other. A similar conclusion may be 
arrived at mathematically by Bayes’ 
(16) theorem, which states that the 
distribution of the probability inte- 
Srals of any continuous curve is a 
rectangle with every probability be- 
tween zero and one equally likely. 

On the basis of the preceding an- 
alysis, the form of the psychometric 
unction may be predicted. It has 
already been shown that the number 
a responses to an increment is a 
an of the size of the increment; 

le greater the increment, the greater 
will be the number of responses. The 
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rate at which the number of responses 
increases with the increase in the size 
of the increment can be determined 
from the frequency of occurrence of 
the surplus values. Since the proba- 
bility function is rectangular, one 
value of surplus occurs as frequently 
as any other. Therefore, for a given 
increase in the size of the increment, 
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Fic.°4, PROBABILITY FUNCTION OF 
SURPLUS VALUES 


the proportion of surpluses which can 
be augmented to neural unit size, or 
greater, is always the same. The 
rate of increase in the number of re- 
sponses is, then, constant and the re- 
lationship between increment size 
and percentage of response is clearly 
linear. , 

Such a psychometric function may 
be graphically represented by a 
straight line, i.e., the integral of the 
rectangular probability distribution 
of surplus values. The zero point 
for this function should correspond to 
the value of the standard stimulus, 
since any increment, no matter how 
small, will find a surplus which it can 
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augment to neural unit size, pro- 
vided the increment is presented a 
sufficient number of times. The 100 
per cent point should correspond to 
the smallest increment which always 
succeeds in exciting an additional 
neural unit. This increment, which 
must be independent of the surplus 
since it always produces a discrim- 
inatory response, provides a measure 
(ASq) of the size of the neural quan- 
tum. 

The foregoing statements of the 
theory of the neural quantum can be 
summarized in the form of mathe- 
matical equations. Equation 1 has 
already been formulated (AS=ASq 
— p) and indicates that an additional 

P uantum will be activated whenever 
the amount of energy in an increment 
is sufficient to augment the surplus 
energy to a neural unit amount. Since 
the surplus (p) fluctuates between 
0SpSASq and any value of p is as 
likely to occur as any other, the pro- 
portion of times that an increment 


will activate an additional neural 
unit is given by: 


f=—) [2] 


where fi is the relative frequency of 
the instants during which AS excites 
one additional quantum; AS and ASq 
have the same Meanings as previously 
given. 

Two features of the relationship 
expressed in Equation 2 should be 
noted: (a) the proportion of re- 
sponses increases as a linear function 
of incremental size and (b) the value 
of fı may vary from zero to one. 

Equations 1 and 2, however, hold 
only for those discrimination situa- 
tions in which the excitation of a 
single additional quantum is suffici- 
ent to produce a response; but, the 
evidence of Békésy (1), Miller and 
Garner (32), and Blackwell (4) shows 
that usually two additional units 
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must be activated before a discrim- 
inatory response is reported. This is 
attributed to the fluctuations in sen- 
sitivity which occur during the pre- 
sentation of the standard stimulus. 
Since these fluctuations may podni 
surplus values of neural unit size, t E 
subject finds it difficult to distingu!s 
this excitation from that resulting 
from the presentation of an adequate 
increment. If, as indicative of E 
increment, the subject adopts & © ie 
tainty criterion which can be we i 
only when two additional quanta a | 
excited, he is then able to distingu!s i 

the effect of the surplus alone fro 

the combined effect of increment aa 
surplus. In this case, the prapora 

of times that a given increment vR d 
produce a discriminatory respo” 
may be expressed as follows: 


AS [3] 


— 


occurrence of the instants donti 
which AS excites two agaito 
quanta. Observe that fz may @ 
vary between zero and one. | i in 

Equation 3 may be agen in- 
terms of the percentage (P) of t ud | 
crements to which an observer ano re- 
be able to make a discriminatory 
sponse. In this form, 


AS 
Pa (E -1)x 100, 
ASq a 


and P may vary between 0 per eeng 
and 100 per cent. 

Referring to Equation 4, 
ments less than quantal in S! nta 
never stimulate two additional que" se 
since the surplus cannot excec® -g 
unit; hence, the combined energy |. 
increment and surplus will be an 
than that required for two units © 
no discriminatory response will oc 
With increments of quantal ae all 
greater, discriminatory response? an 
occur and will increase in the $ 


. ; of 
where fs is the relative frequenc” | 
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manner as described in the case of 
the “‘one-quantum”’ criterion. One 
hundred per cent response will occur 
at the smallest increment value which 
can always excite two additional 
quantal units. This value, which is 
independent of surplus, will be twice 
the size of the largest increment to 
which a response never occurs. This 
prediction follows from the assump- 
tion previously made that neural 
units are equal. The largest incre- 
ment to which a response “just 
never” occurs is taken as a measure 
of the size of the first quantal unit; 
the smallest increment to which a re- 
sponse always occurs is taken as the 
size of two quantal units. Conse- 
quently, on the assumption of equal 
units, a two-to-one ratio obtains be- 
tween the value at which the psy- 
chometric function reaches 100 per 
cent and the value at which it first 
departs from 0 per cent. 

If the experimental conditions and 
underlying assumptions of the quan- 
tum theory are satisfied, a typical 
psychometric function such as that 
for pitch or loudness discrimination 
should resemble the function pre- 
sented in Fig. 5. Two features of the 
function should be observed: (a) 
there is a linear relationship between 
the percentage of increments heard 
and the magnitude of stimulus-in- 
crements presented, and (b) there is a 
two-to-one ratio between the values 
of the function at the 100 per cent 
point (2 quanta) and the 0 per cent 
Point (one quantum). These features 
of the psychometric function are the 
two specific deductions of the neural 
quantum theory of sensory discrimi- 
nation which can be subjected to ex- 
perimental verification. 


QUANTAL PREDICTIONS AND 
TECHNIQUES OF EVALUATION 


The first major prediction of the 
theory of the neural quantum is that 
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the percentages of stimulus incre- 
ments discriminated will be distrib- 
uted rectilinearly between 0 per cent 
and 100 per cent. Stated symboli- 
cally, P will be a linear function of 
AS. The classical hypothesis, as pre- 
viously indicated, would predict a 
sigmoidal probability function for the 
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same set of data. The question be- 
comes, then, “which of the two hy- 
potheses, sigmoidal or quantal, better 
fits these data points?” (35, p- 61). 
To answer this question, the best- 
fitting sigmoidal and rectilinear func- 
tions must be constructed for the 
given set of data.’ While any one of 
several techniques may be employed, 
the curve-fitting process is most ade- 


8 At least one investigator (11) has assumed 
that threshold data may be fitted by a log- 
Gaussian distribution, i.e., an ogive expressed 
in terms of a logarithmic scale of stimulus 
magnitude. Since the normal ogive and log- 
Gaussian distribution are highly similar, no 
special case will be made in the present paper 
for this additional hypothesis. 
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quately accomplished by using either 
the method of least squares or the 
more recently developed technique 
of probit analysis (21). When the 
best-fitting sigmoidal and rectilinear 
functions have been obtained, it will 
usually be found that neither curve 
gives a perfect fit, i.e., Passes through 
all the data points. Thus, an ap- 
propriate statistical test of goodness 
of fit, such as chi square (28), must 
be applied to determine the proba- 
bility that the experimental data 
could have been obtained by chance 
when the “true” function was either 
rectilinear or sigmoidal in nature. 
The results of this analysis will indi- 
cate whether the specific theoretical 
hypothesis being tested should be re- 
jected or retained, 

The second Major prediction of 
the quantum theory is that the small- 
est stimulus-increment at which 100 


per cent discrimination occurs will 


as that at which 0 
ation occurs, This 
an observer has 
adopted a “two-quanta" criterion of 
1 However, regardless of 

the judgmental criterion adopted 
the quantal index may be defined in 
the general case as follows: 


AS, 
ie 
e AS\~AS,' [5] 


be twice as large 
per cent discrimin 


antal index, or pre- 


i 1 is the size of the 
smallest stimulus-increment at which 


to require that three additi 
excited by the stimulus- 
crimination to occur, 
“three-quanta” criterion of ; 
discriminations will occur unt; 
is sufficient to excite two 
discriminations will occur 10 
time when the increment is sufficient to excite 
three additional quanta. 
does not alter the basic formulation of the 
quantum theory, it will not be treated inde- 
pendently in the present paper. 


increment for dis- 
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100 per cent discrimination occurs} 
and ASp is the largest stimulus-incre- 
ment at which 0 per cent discrimina- 
tion occurs. Under a ‘“‘two-quanta 
criterion, AS; always excites two ad- 
ditional quanta, and AS» always ex- 
cites one additional quantum.!° one 
the quantal units are assumed to ‘ 
equal, QJ will equal two. For nied 
quantal criterion adopted, Ee pe 
[5] will yield an integral value o X 

In the computation of QZ, the she 
ues of the stimulus-increments oa 
used in Equation [5] are obian 
by solving, algebraically or grap m 
cally, for the 100 per cent and Pie 
cent discrimination points in the 3 
ear functions fitted to the exP 
mental data. 


7? THE 
SOME REQUIREMENTS OF TH 
QUANTAL METHOD 


Thedemonstration of neural Ta 
apparently depends upon very he rel- 
ous experimental controls. If t tions 
atively large, momentary nuena a 
in over-all organismic sensitiv! > ure 
not to obscure the “true” ” cer- 
of the discriminatory process: AS 
tain Precautions must be taken- an 
stated by Stevens, Morgan, must 
Volkmann (38, p. 319), we move 
add AT instantaneously, and re 
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it before the organism 1S oe 
change in sensitivity by more uire 


negligible amount.” This E nta 
ment dictates certain enpi ae 
Procedures: (a) there must is 
time interval between the pres 
tion of the standard stimulus 
variable stimulus, and (b) the s 
le stimulus must be of very n 

uration. If these conditions T in 
Satisfied, the random fluctuatio" 
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over-all sensitivity may be expected 
to result in nonrectilinear psycho- 
metric functions (38). 

Other requirements have also been 
specified by Stevens, Morgan, and 
Volkmann (38). (a) The observer 
must experience little difficulty in 
making discriminatory judgments. 
This presupposes that the observer is 
well trained and that the experi- 
mental situation maximally aids the 
focusing of attention and the stabil- 
ization of judgment criteria. (b) The 
Judgments must be made rapidly 
enough to eliminate the need for 
averaging results from different ex- 
perimental sessions. If possible, all 
Judgments should be made in a single 
Session, thus minimizing the effects 
of temporal variations. (c) Some ob- 
servers may be aided in directing 
their attention by introducing a 
Warning” signal, such as a dim 
light,” at the proper moments in the 
test trials. This technique enables 
the observer to adjust to the series of 
pi sora presentations and serves to 
Pd the fatigue of sustained atten- 
ae (d) No transient sounds must 
>e introduced in the transitions be- 
tween the standard stimulus and the 
Comparison stimulus. If such sounds 
are present, they may be used as ex- 
ee cues and will tend to distort 

e resulting psychometric function. 


EXPERIMENTAL TESTS AND SOME 
CRITICAL CoMMENTS” 


Ba lowing the earlier work of 
ekésy (1, 2), Stevens and Volk- 
mann (37) tested the hypothesis that 
Oudness discrimination data could 


u 
Pia observers, however, find the light 
wie, and „prefer to make judgments 
i The this auxiliary cue (32, 38). 

ae GH nese comments are not to be construed 
Tale ces of individual authors or or jour- 
asper ut are intended to point out some 
annie of the experimental findings or of data 

ysis which may aid in appraising the 
Present status of the neural quantum theory. 


be adequately represented by a linear 
function in accordance with the re- 
quirements of the quantum theory. 
The experimental techniques em- 
ployed the precautions already out- 
lined in the preceding section of this 
paper. A single trained observer was 
used at a frequency of 100 cps pre- 
sented at five (20, 30, 50, 60, and 80 
db) sensation levels (SL, db above 
threshold). At each SL, the observer 
listened to a continuous tone whose 
intensity was increased for 0.15 sec. 
at 3 sec. intervals. The task of the 
observer was simply to press a key 
whenever an increment was heard. 
Each increment was presented be- 
tween 50 and 100 times in random 
blocks of 25 presentations each. The 
obtained percentages of perceived 
judgments ranged from 0 per cent 
to 100 per cent. 

Since the obtained psychometric 
functions showed the predicted recti- 
linearity and the two-to-one integral 
relation, it was concluded that the 
data supported the quantum theory 
of discrimination. However, two fea- 
tures of the data analysis should be 
considered: (a) the two-to-one inte- 
gral relation was obtained on the 
basis of visually fitted psychometric 
functions, and (8) no tests of goodness 
of fit of these functions were re- 
ported. 

Stevens, Morgan, and Volkmann 
(38) later extended the preceding 
study, using six trained observers in 
pitch discrimination. The procedure 
employed was essentially the same as 
in the case of loudness discrimination. 
A total of 100 judgments was made 
by each subject at each of several 
(eight to ten) frequency increments, 
all of which were less than 10 cps. 
The standard stimulus was a 1,000 
cps tone at 54 db SL presented in 
random blocks of 25 trials each. 
Functions were also obtained for a 
single observer at five—16, 25, 46, 64, 
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and 90 db—sensation levels, and for 
another observer at four—25, 30, 54, 
and 80 db—sensation levels. 
In the treatment of data, linear 
functions were fitted to the experi- 
mental values for pitch discrimina- 
tion by the method of least squares 
and phi-gamma functions were fitted 
to the same values by Boring’s (6) 
method." For purposes of curve fit- 
ting, Af was taken as the independent 
variable and all points falling below 
3 per cent and above 97 per cent were 
omitted in computing the constants 
of the fitted functions. For each of 
the fitted functions, a chi-square test 
of goodness of fit was applied and the 
corresponding P value was deter- 
mined. For both types of functions, 
the number of degrees of freedom 
was taken to be two less than the 
number of points to be fitted. The 
results of this analysis showed that 
in 14 of the 15 sets of data, the P 
values were higher for the rectilinear 
functions than for the phi-functions 
of gamma. In general, the P values 
for the functions predicted by the 
quantum theory were above 0.5, 
whereas those for the “classic” theory 
were less than 0.5. Furthermore, the 
two-to-one integral relation was found 
to hold rather well in most of the 15 
sets of data, but the values ranged 
from 1.89 to 2.34. 


While these data obviously favor 


13 This method 
observations accor: 
44) which contain 
weights and Urb 


served data to the phi- 
Urban weigh 
emphasis on 
solving for the constants of the phi-gam: 

function by the method of least sgia HA 
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the quantum theory, a re-examina- 
tion of Table I in Stevens, Morgan, 
and Volkmann (38, p. 329) shows 
that the phi-function of gamma, de- 
spite yielding generally lower - 

values, is considered unacceptable in 
only one of the 15 fits according to 
Culler’s (12) interpretation. Nine 
of the phi-gamma functions have a fit 
described as “good” or better, com- 
pared to 14 rectilinear functions 17 
the same classification. It would m 
pear, therefore, that on the bans 

the individual chi-square values 09- 
tained in this study both hypotheses 
remain tenable. 

In an attempt to demonstrate © 
more decisive difference between the 
goodness of fit for the classical an 
quantal hypotheses, a composite E 
the individual P values was 2$ 
computed by Stevens, Morgan, an 
Volkmann (38), Since the compos! š 
P value for all 15 sets of data take! 
together was 0.931 when the rectilin- 
ear functions were fitted and ony 
0.008 when the phi-gamma function’ 
were fitted, the quantal hype e 
was considered supported and t? 
classical hypothesis was considere 
quite unacceptable. de 

Flynn (17), however, has ma 
three criticisms of the treatment ° 
data in the Stevens, Morgan, we 
Volkmann (38) study: (a) disregat 
ing those points below 3 per cent an : 

7 per cent was considered unjustt 
able when the fitting was done 
compare rectilinear and phi-gamm 
hypotheses since the critical aspect? 
of this comparison involve these "i 
treme values, (b) although the © > 
servations were weighted for relia 
bility by Urban’s!4 method in deter- 
mining the best-fitting normal ogives» 
no weighting was reported in fitting 
the rectilinear functions, and (¢) #— 


1 See footnote 13. 
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degrees of freedom (df) should have 
been used for the ogive fit. Flynn 
(17) concludes, nevertheless, that 
the general finding is probably cor- 
rect that 14 out of the 15 sets of data 
are fitted better by a rectilinear func- 
tion than by a phi-gamma function. 
Lewis and Burke (27) have also 
pointed out certain weaknesses in 
the application of the chi-square test 
to the Stevens, Morgan, and Volk- 
mann (38) data. (a) In comparing 
the goodness of fit of the two differ- 
ent functions, the same quantity was 
not minimized in the process of ob- 
taining the constants for the fitted 
unctions. In fitting the linear func- 
tions, the sum of squared differences 
between observed and theoretical 
p pon one was minimized; but, in 
tting the phi-gamma functions, the 
o of the squared differences be- 
R observed and theoretical val- 
ed e gamma was minimized. This 
i tend to yield chi-square val- 
met the phi-gamma functions that 
: inexact and probably inflated 
i aa unknown amount. (b) In the 
E of the pitch discrimination 
or the six observers at 1,000 

cps, 54 db SL, four extreme empirical 
Proportion: were excluded in de- 
ne the constants of the fitted 
ea ma functions, but were in- 
i ed in the calculations of chi 
quare for individual observers. This 
Procedure would also tend to inflate 
Ti composite value of chi square. (c) 
ma were seven theoretical propor- 
fra s representing small theoretical 
i quencies (less than 10) which 
as preferably have been com- 
Ta with adjacent proportions. 
i on these factors were considered 
a e calculation of new values of 
ea square for the phi-gamma func- 
ns of the six observers, the com- 
Donite chi square was 11.76, with 10 
if, as compared to the original value 
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of 33.80, with 21 df.% Since the re- 
calculated value of chi square falls 8 
about the 30 per cent level of confi- 
dence, the phi-gamma hypothesis 
cannot be rejected. 

In a study on pitch discrimination, 
Flynn (17) used three trained ob- 
servers to listen to a continuous 1,000 
cps tone at a 55 db SL. The tone, 
lasting 1.25 min. per block of 25 
trials, periodically changed in fre- 
quency for 0.30 sec. The task of the 
observer was to report each time 
this change in frequency was de- 
tected. The number of trials for each 
increment was usually 50 or 100, with 
extremes of 35 and 200. Thirty sets 
of data were obtained from the three 
observers. For each set of data, the 
best-fitting normal ogive and the 
best-fitting straight line were com- 
puted by the method of least squares. 
Precautions were taken to weight the 
empirical proportions in fitting both 
of these functions so that any ob- 
served differences in goodness of fit 
could not be attributed to differences 
in fitting techniques.’ £ 

On the basis of chi-square tests of 
goodness of fit as adapted by Thom- 
son (40), 16 sets of data were found 
to fit neither a straight line nor a 


ons for the chi-square 
functions had the same 
ose outlined for the phi- 
except that none of the 
s omitted during the 
ss were later included in the 
f fit. Lewis and Burke (27) 
hat the linear functions 


ichted proportions. 
ban 


weights, corre n 0 
vations, were used; for the straight lige e 

the Urban weights were needed and appli . 
17 There were two exceptions to Thomson's 
pt was made to 


dure: (a) no attem 
(20) proc ac e values by means of the 


compare chi-square 
ard error, and (b) the number of df for 
the ogive was the number of percentages 


minus 3; for the straight line, minus 2. 
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normal ogive. Four sets of data were 
found to fit a normal ogive better 
than a straight line, while the re- 
maining ten sets of data could be 
fitted better with a straight line, than 
with a normal ogive. However, some 
of the weaknesses in data analysis 
mentioned by Lewis and Burke (27) 
were also evident in this study, e.g., 
the 0 per cent and 100 per cent points 
were omitted in determining the best 
fitting curves but were reintroduced 
in testing for goodness of fit. In 
addition, the two-to-one criterion de- 
manded by the quantum theory did 
not hold in those cases in which linear 
functions were obtained. Thus, it 
would appear that the evidence of 
this study contrary to Flynn’s (17) 
interpretation, may be considered as 
failing to support the neural quan- 
tum theory since both predictions 
were not met. 

Koester and Schoenfeld (26), while 
comparing the relative merits of 
quantal and nonquantal procedures 
in pitch discrimination, were also 


interested in duplicating the quantal 
findings, 


For each observer, a complete set 
of psychometric data was obtained 
by the quantal method on each of 
four days. It was concluded that 
none of the data exhibited either the 
rectilinearity or the integral relation 
predicted by the quantum theory, 
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However, no mention was made of 
fitted functions or tests of goodness 
of fit. Also, since each of the eight 
psychometric functions was based on 
either six or seven points with only 
twenty observations per point, the 
conclusions of this study should be 
accepted with caution. a 
In a study designed to investigate 
some of the factors which might ob- 
scure the quantal nature of the me 
criminatory mechanism, Miller an 
Garner (32) obtained intensity dis- 
crimination data for a 1,000 cps tone 
at 40 db SL on two observers using 
both the standard quantal procedure 
of Stevens, Morgan, and Volkmann 
(38) and a modified quantal pro- 
cedure. In the modified procedure, 
the stimulus-increment was altere 
at random after each presentation 
and the observer was not permitte 
to stop after every 25 presentations 
This modification was introduced ; 
prevent the observer from establis e 
ing a fixed “two-quanta” criterio 
of judgment. á 
The results obtained by the sa A 
ard quantal method showed that Te 
two psychometric functions could ht 
adequately represented by a straig 3 
line fitted by the method of Jeab 
Squares and that the predicted inte 
gral relation was closely approx! 
mated. For the modified procedure, 
the phi-gamma functions were ae 
by the technique proposed by at 
ford (22); the quantal hypothes! 
was evaluated by fitting a series o 
three straight lines to the empirica 
values lying between the successive 
quantal points as determined by the 
Previously-administered standa" 
method. On the basis of chi-square 
tests of goodness of fit, it was con- 
cluded that the quantal hypothes!§ 
Provided a better description of the 


data than did the phi-gamma bY- 
pothesis, 
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_ While these findings are indeed 
significant, two aspects of data an- 
alysis should be pointed out: (a) the 
question arises as to whether data as- 
sumed to have been obtained under 
three different certainty criteria and 
fitted by three different linear func- 
tions can be validly “connected” to 
yield a single psychometric function; 
and (b) assuming that a single func- 
tion is valid, each of the straight lines 
at the two extremes of the function 
must be made to intersect an addi- 
tional horizontally-linear segment if 
the function is not to predict impossi- 
ble response values greater than 100 
per cent and less than 0 per cent. 
wee further analyzing the data of 

evens and Volkmann (37) and of 
Flynn (17), Miller and Garner (32) 
proceeded to show that (a) the pro- 
Posed technique of fitting three linear 
ea to psychometric functions 

olds in general for those cases in- 
volving criterion shifts by the ob- 
erable and is not limited to loudness 

iscrimination or to the random 
method of stimulus presentation, 
and that (b) combining of data ob- 
tained either under different experi- 
mental conditions or from different 
observers tended to yield psycho- 
metric functions in accordance with 
the phi-gamma hypothesis. Thus, 
the work of Miller and Garner (32) 
tends to support the quantum theory 
and serves to isolate some of the 
factors responsible for nonquantal 
findings. 

In a fairly extensive study, Corso 
(10) recently attempted to test the 
hypothesis that the data obtained in 
the auditory discrimination of fre- 
quency and intensity satisfied the 
conditions predicted by the theory 
of the neural quantum. For intensity 
discrimination, the general procedure 
followed that of Stevens and Volk- 
mann (37); for frequency discrimina- 


tion, that of Stevens, Morgan, and 
Volkmann (38). In all, 20 subjects 
were used in the study, after having 
been screened from a larger group of 
45 by means of an audiometric test 
and the Seashore pitch and loudness 
tests. Each subject was given two 
separate practice hours (for a total 
of 425 to 1,225 judgments) under the 
specific conditions of the test trials. 
Five subjects were tested under each 
of the following conditions of fre- 
quency discrimination: (a) 1,000 cps 
at 20, 40, 60, and 80 db SL, and (6) 
300, 1,000, and 3,000 cps at 60 db 
SL. A similar pattern was followed 
for intensity discrimination. For 
each test condition, at least six stimu- 
lus increments were presented, with 
approximately 200 judgments being 
made at each increment-value. 

In the analysis of data, linear func- 
tions were fitted to the individual 
sets of frequency and intensity dis- 
crimination data by the method of 
least squares, with all empirical pro- 
portions greater than 0.97 or less 
than 0.03 omitted. The chi-square 
test was used to test the goodness of 
fit of each obtained function. In the 
calculation of chi-square values, all 
theoretical proportions greater than 
0.97 and less than 0.03 were appro- 
priately combined with adjacent pro- 
portions. This technique insured that 
in calculating the chi-square values 
(a) no proportions representing the- 
oretical frequencies of less than five 
were used, and (b) no empirical pro- 
portions were used which did not 
the solutions of the param- 
eters of the linear functions. Of the 
puted, only 
quency discrimina- 
discrimination) 
1 to or greater 
than 0.05. Of the nine psychometric 
ich the hypothesis of 
d, only one had 
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a ratio-value which approached the 
predicted two-to-one criterion. It 
was concluded, therefore, that the ex- 
perimental results in frequency and 
in intensity discrimination failed to 
satisfy the predictions of the theory 
of the neural quantum. 

Licklider (29), in reviewing Corso’s 
(10) study, pointed out that failing 
to obtain psychometric functions 
which conform to quantal predictions 
can only disprove the quantum 
theory if “all the non-physiological 
error variance” has been eliminated 
from the experimental measurements 
and the observer is worked at his 
“physiological limit.” Obviously, 
the existence or nonexistence of these 
qualifying conditions can (perhaps) 
never be known, but only assumed 
from the obtained data. One would 
expect, however, that in two essen- 
tially identical experiments such a 
source of error would be roughly 
equivalent, unless some unusual (and 
Perhaps drastic) precautions were 
taken in one experiment and not the 
other. 
lay Soe of the neural quantum 

extended to include the 

problem of sensory discrimin 
areas other than audition. 

functions sing tO Psychometric 

g stimulus Pressure, as 

measured by an Elsberg olfactometer, 


In the 


ation in 
Jerome 


delivered. The task 
as one of distinguishin 
stimuli from a contr 
those from a citral bott 
ten presentations of 
from each bottle at ea 
(seven to nine) Pressur 
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Seven psychometric functions were 
obtained from the data of the two ob- 
servers in the discrimination experi- 
ment, and six psychometric functions 
were obtained on two observers in the 
preliminary test of instructions. Lin- 
ear functions were fitted to each of 
these 13 sets of data by the method 
of averages. The results of this an- 
alysis showed that the criterion of 
rectilinearity was satisfactorily met; 
but, since the additional criterion of 
the integral relation was not met, it 
was concluded that the existence of 
a differential olfactory quantum was 
not demonstrated. 

There are three apparent weak- 
nesses in Jerome's (25) study if the 
data are to be used to evaluate the 
quantum theory; (a) a nonstandard 
quantal procedure was used inasmuch 
as an interval of 30 sec. was permitted 
to elapse between stimulus presenta- 
tions to avoid olfactory fatigue; 
only ten observations were made at 
the critical values at the extremes 0 
the psychometric functions,!2 an 
(c) no tests of goodness of fit were 
Teported, presumably due to the 
Presence of small theoretical fre- 
quencies which precluded the use of 
chi square, Thus, it appears that 
the data of this study cannot form a? 
adequate basis either for the accept- 
ance or for the rejection of the quan- 
tum theory, 

DeCillis (14) attempted to follow 

© quantal procedure in finding the 
relation between amplitude of stimu- 
us movement over a cutaneous area 
and frequency of positive response 

™ Since no tests of goodness of fit were Te- 


Ported, it is presumed that rectilinearity W* 


determined by visual inspection of the fitte 
unctions, 

A 1 For example, of the 13 functions obtained 
ve had no observations at stimulus value 
yielding between 80 per cent and 100 per cent 
o sponse; seven had no observations betwee? 

Per cent and 20 per cent response. 
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The procedure employed was to pre- 
sent a fine air column, at a pressure 
of 35 Ibs. per sq. in., at a point on 
the skin for 0.10 sec. The air column 
then traveled across the skin at a 
rate of 143 mm./sec. to another 
Point where it was again stationary 
for at least 0.10 sec. After the air 
column was turned off, the needle 
controlling the stimulus returned to 
its starting position. This procedure 
was repeated 20 times in a series, 
with the same amplitude of move- 
Ment presented on each trial. Three 
Subjects were used and sensitivity 
Was measured on the fingertip, arm, 
and leg. The task of the observer 
Was to report “yes” whenever move- 
ment of the air column was per- 
ceived and “no” whenever it was 
not. 
The best-fitting straight lines were 
calculated by the method of least 
Squares for 35 selected psychometric 
functions, with the 0 per cent and 100 
al cent points omitted in the curve- 
ie E Process. The chi-square test 
i goodness of fit was applied fol- 
Thane the method of Brown and 
Wee oe (9). No attempt was 
pa s to fit 16 sets of nonhomogene- 
7 data. The results of this analy- 
1s yielded 20 chi-square values with 
Probability values equal to or greater 
a n 0.95, while 15 values were 
A aller than this. It was concluded, 
pera that it was not “unrea- 
itu, e to maintain that the best- 
linea PSychometric function is recti- 
an (14, p. 47). However, in 
te © cases where the hypothesis of 
di arity was retained, the criterion 
the integral relation did not hold. 
- eCillis (14, p. 49) contends that 
no parently the integral relation is 
to be expected in studies of ab- 
Solute sensitivity.” 
It is unfortunate that in this study 
extensive data were not col- 
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lected at those points on the psycho- 
metric functions where maximal dif- 
ferences between the phi-gamma and 
quantal hypotheses were to be ex- 
pected. In the 20 out of 35 fitted 
functions in which the linearity hy- 
pothesis was retained, 19 functions 
had no observations at stimulus val- 
ues yielding between 0 per cent and 
15 per cent (or more) responses; 10 
functions had no observations be- 
tween 85 per cent (or less) and 100 
per cent responses. Thus, for a given 
set of data, since the 0 per cent and 
100 per cent points were also omitted 
in the curve-fitting process, the re- 
maining empirical points would prob- 
ably not have deviated from a 
straight line, whether or not the 
“true” function were ogival or linear. 

In the most recent attempt to eval- 
uate the quantal hypothesis, Black- 
well (4) obtained visual discrimina- 
tion data for four observers using 
normal binocular viewing and natural 
pupils at a luminance of 4.71 foot- 
lamberts. The stimulus was a circu- 
lar luminance-increment, subtending 
18.5’ located 7° to the right of the 
fixation spot and was presented for 
a duration of 0.06 sec. once every 
12.25 sec. Each psychometric func- 
tion was based on 14 to 18 incre- 
ment-values with 20 observations at 
each point. The increments were 
presented in random blocks of 20 
trials each and the same increment 
was used throughout all the trials of 
a block. The observer indicated dis- 
crimination by responding “yes” or 
“no” to each presentation of the 
stimulus. 

In the data analysis, a linear func- 
tion was fitted to a selected set of 
data for each observer.” The selec- 
tion of a specific set of data from 


20 The exact method of curve fitting is not 
spect ~in—Blackawell (4), but_the Vse of 
prlit analysis is implied. | > ahi 
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among the total number of sets avail- 
able (Observer 1: 4 sets; 2, 18 sets; 
3, 24 sets; and 4, 10 sets) was carried 
out in such a way as to obtain the 
function most adequately fitted by a 
quantal curve. The results of this 
analysis revealed that the data of 
two observers could not be used to 
evaluate the quantal hypothesis. 
For one of these subjects, the stimuli 
were spaced too widely over the criti- 
cal psychometric range; for the other, 
the data were extremely scattered. 
The data of the remaining two sub- 
jects were fitted adequately in most 
cases by either a “two-quanta” or a 
“three-quanta”’ curve, 

Blackwell (4), however, considers 
this apparent confirmation of the 
theory as spurious. This assertion 
is based upon the fact that, as the 
criterion of judgment increases from 
two quanta to three quanta, the 50 
per cent threshold decreases rather 
oe es > expected from an 
extension of the quantum theory. 
The hypothesis js advanced that 

response channelization” (5) may 
actually be responsible for the fact 
oof ements data appesr 

` quantal rectilinearity, 
It is concluded (4) that the visual 
threshold-data obtained by the stand- 
ard quantal procedure do not confirm 


the predictions of the neural quantum 
theory. 


Discussion 


While the Predictions of the theory 
of the neural quantum are specific: 
(a) rectilinearity of the Psychometric 
function, and (b) an integral relation 
between the values of the stimulus- 
increments at the 100 Per cent and 
0 per cent response Points on the psy- 
chometric function, the experimental 
task to evaluate these predictions is 
extremely difficult. Ag Blackwell 
(4, p. 398) states, “Essentially, the 
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quantum theorists have so restricted 
the allowable conditions of measure- 
ment and the analysis of data that it 
is difficult to obtain an unambiguous 
evaluation of the theory.” Licklider 
(29, p. 99) has aptly summarized the 
difficulties to be encountered in at- 
tempting to obtain negative evidence 
on the theory of the neural quantum 
by saying that “it is a shame that 
the quantum theory has such strong 
built-in self-protection.” i 
In the first place, the applicability 
of the theory is restricted to data col- 
lected under one psychophysical pro- 
cedure: the quantal method (21). 
This is unlike the usual approach to 
psychophysical research where one 
or more methods may be appropriate 
for the investigation of a given prob- 
lem. Within the quantal method, 
Blackwell (4) objects to the phe- 
nomenal report” as the only “indi- 
cator-response” and cites data (3) to 
Support his contention that the 
“forced-choice” technique”! is more 
adequate than the phenomenal report 
in threshold measurements under 
routine conditions. It is maintaine 
that the use of “forced-choice” as 
the “indicator-response” tends 
Minimize session-to-session variabil- 
ity when practiced observers are 
used. 
A second restriction placed on the 
data-collection Process is that stimu- 
us increments must be grouped into 
blocks of Presentations of the same 
magnitude. Miller and Garner (32) 
ave demonstrated clearly that the 
random ordering of stimulus magni- 
tudes prevents even the well-traine 


31 The “forced-choice” technique is defined 

y two conditions: (a) the observer “is Te- 
quired to indicate discrimination by correct y 
identifying Some verifiable attribute of the 
stimulus such as its spatial location or pme 
poral interval; and (b) he is required to select 
an answer on each stimulus-presentation 
even if he has to guess” (4, p. 398). 
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observer from adopting a stable cri- 
terion, but the interpretation of the 
single(?) function obtained is not too 
clear. Blackwell (4) argues that the 
nonrandom block presentations pro- 
vide the observer with the oppor- 
tunity to respond in an invalid man- 
ner. It is maintained that the pres- 
ence of “positive response channel- 
ization” and of “negative response 
channelization” will operate to dis- 
tort threshold data into a form re- 
sembling that required by the quan- 
tum theory.” Senders and Sowards 
$0), In a study in which judgments 
ere made of the simultaneity of 
treeless of a light and a tone, 
. so found that successive presenta- 
tons of the stimulus near threshold 
tended to produce long series of iden- 
tical responses. 
pas eater and Schoenfeld (26, p. 11) 
Bieter pa somewhat similar to 
the a s (4), contending that in 
i hang of prolonged practice the 
Sieur may learn to adjust, and 
the ite : their certainty criteria to 
Sie values of the comparison 
RES i in such a way as to yield the 
ch sary rectilinearity in the psy- 
Ometric function and the integral 
relation,” 
o i (35, p. 64) raises the same 
f Pia on methodology by asking: 
iene ee knows that all incre- 
3 nh a given series are going to 
a atical, is it not possible for him 
+ ey a subjective standard on the 
to = which he graduates the fre- 
> cy of his responses?” Some evi- 
ce on this point is reported in the 


22 te — 
fined Positive response channelization” is de- 
ceived S an increase in the number of per- 
blocks Judgments toward the ends of the 
response stimuli for which the predominant 
cl afek, was Positive. “Negative response 
niepen see is defined as an increase In the 

ery of nonperceived judgments toward 

ae aa of the blocks of stimuli for which the 

nant response was negative. 


study of Senders and Sowards (36) in 
which it was found that the observers 
tended to adjust their proportion of 
responses in accordance with their 
expectations. 

A third restriction which must op- 
erate in quantal studies is that only 
data collected in a single experi- 
mental session may be used to test 
the quantal predictions. Miller and 
Garner (32) have demonstrated that 
two sets of data obtained from the 
same observer by the same procedure 
but at different times will average 
into a typical sigmoidal distribution, 
even though the separate functions 
are rectilinear. Nevertheless, such a 
restriction makes it practically im- 
possible to establish the adequacy of 
the quantum theory with any high 
degree of confidence. Through a 
special application of the chi-square 
test, Blackwell (4) has shown that no 
matter how many sets of experi- 
mental data are available, at least 40 
presentations must be made in each 
experimental session at each of 10 
stimulus-values if the normal ogive 
is to fail to fit the data, even though 
the data may actually conform to the 
specific requirements of the quantum 
theory. In the studies reviewed in 
the present paper, most functions 
were based on less than 10 stimulus- 
values each and many functions uti- 
lized 25 or less observations per ex- 
perimental point. Apparently, under 
these conditions, unequivocal results 
could not have been expected. An 
additional comment is made by Lewis 
and Burke (27) who indicate that, 
in data analysis, the elimination of 


extreme proportions makes it impos- 
al evaluation 


sible to obtain a critical e 
of the phi-gamma hypothesis through 
the use of the chi-square test of good- 
ness of fit. 

There is some evidence to show 
that, perhaps, the restriction on the 
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combining or averaging of experi- 
mental data is unjustified. Koester 
and Schoenfeld (26) have shown 
that threshold data remain fairly 
consistent from day to day; Corso 
(10) found no differences among data 
collected in two successive hours; 
Myers and Harris (34) have reported 
that the fluctuation of the auditory 
threshold is approximately less than 
1 db for relatively short Periods of 
time. It should be recalled, however, 
that the assumption of momentary 
and random changes in the over-all 
sensitivity of the observer is funda- 
mental to the derivation of the quan- 
tum theory. Thus, any compromise 
on this restriction would undoubtedly 
necessitate a major revision of the 
theory. Nevertheless, since the in- 
tegral relation predicted on the as- 
sumption of relatively large fluctua- 
tions in sensitivity has seldom been 
obtained, it may be that in the final 


analysis this assumption may prove 
to be unwarranted, 


Volkmann (38, p. 334) that some of 
the stimulus-incr, 
ceived as 


Ki than increments heard only 


f, as developed 
Y, “discrimina- 


quantal uni 
when it is added only 20% 
time as compared with į 
added 80% of the time?” 
While this might well be a critical 
issue for the theory of the neural 
quantum, its resolution is not readily 
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apparent, despite Jerome’s (25) con- 
tention that evidence was obtained 
for the existence of a “magnitude 
quantum” of olfactory sensitivity. 
Wever (46) has taken the position 
that peripheral quanta have not, as 
yet, been demonstrated and are not 
to be expected on the basis of the vol- 
ley theory of hearing. According to 
the volley theory, pitch is considered 
to be a continuous function of volley 
frequency for low and intermediate 
tones, and of spatial pattern for high 
tones; loudness is considered to be g 
continuous function of the magnitude 
of auditory nerve discharge which de- 
pends upon the number of active 
fibers and the rates of fiber activity- 
It should be recalled, however, 
that Stevens, Morgan, and Volk- 
mann (38) hypothesize that (a) Ge 
neural quantum appears at a centra 
not a peripheral locus, (b) it is func- 
tional rather than anatomical, an 
(c) it involves a number of fibers 
rather than single fiber. Three argu- 
ments are offered to support this 
view as opposed to Békésy’s (1) con 
tention that the quantal unit is the 
individual nerve fiber: (a) the quar- 
tum for the individual observer has 
no fixed Magnitude, (b) for a give? 
Sensory attribute, the number © 
auditory nerve fibers is greater than 
the number of quanta, and (c) the 
binaural quantum is approximately 
two-thirds the size of the monaura 
quantum. It should also be realize 
that the neural units of Stevens; 
Morgan, and Volkmann (38), whether 
or not substantiated, are hypotheti- 
cal constructs and do not specify the 
neural correlates of sensory attri- 
butes (35). 
_ One final Point remains to be con- 
sidered. The derivation of the theory 
of the neural quantum is based ion 
the assumption of two quantitative 
Variables: (a) a physical continuu™ 
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and (b) a psychological continuum.” 
The modern theory of psychophysics, 
however, assumes three parallel quan- 
titative variables: (a) a physical 
continuum, (b) a sensory response 
continuum and (c) a judgment con- 
tinuum (21). According to this 
schema, the notions of neural quanta 
are most directly related to contin- 
uum (b). However, there is a consid- 
erable amount of experimental evi- 
dence which shows that the regres- 
(Ot relating the judgment continuum 
a ‘0 the sensory response continuum 
a k not always linear, and neither 

4 ne correlation always perfect. 
white ee the psychometric function, 
a ates continuum (c) to con- 
ERA a (a), it would be possible to 
~ n a curve unlike that which re- 
br continuum (b) to continuum 
T other words, quantal func- 
be on at continuum (b) might not 
on — ina psychometric func- 
shi i ess a perfect, linear relation- 
A Ree (c) and (b) could be 
ae under certain conditions of 
tere Be a ay controls, atten- 
eae itude of the observer, stabil- 
as oe etc. On the other hand, 
tioa onceivable that quantal func- 
ie O might characterize contin- 
= ai independently of the quantal 
ae e character of contin- 
ofa A Pires even if the existence 
srg oe psychometric function 
tie te ë unequivocally established, 
Bites! erpretation of the processes 
re ying the function would not 

immediately apparent. 


SUMMARY AND CONCLUSIONS 


to che present paper has attempted 
H ulfill two primary objectives: (a) 
Present a complete and detailed 


23 T 
broad he term “continuum” is considered in a 
A sense and permits a quantum theory for 


either variable. 


account of the theory of the neural 
quantum of sensory discrimination, 
and (b) to review the literature on the 
quantum theory in order to assess 
the current status of the theory in 
the light of the total experimental 
evidence now available. 

It has been shown that from the 
theory of the neural quantum two 
specific hypotheses about the psycho- 
metric function may be derived and 
tested: (a) that a linear relationship 
obtains between the size of stimulus- 
increments presented and the per- 
centage of responses observed, and 
(b) that an integral relation obtains 
between the stimulus-increment val- 
ues of the function at the 100 per 
cent and 0 per cent points-of-re- 
sponse. Data in support of these hy- 
potheses would indicate that the 
fundamental processes involved in 
sensory discrimination are discrete or 
quantal in character. 

While the hypotheses derived from 
the quantum theory are experimen- 
tally verifiable, severe limitations in 
methodology and in statistical treat- 
ment of data make it extremely dif- 
ficult to evaluate the tenability of 
the hypotheses as opposed to the 
alternate views of the phi-gamma 
function. However, despite these 
limitations, it may be concluded that 
in certain investigations rectilinear 
psychometric functions have been 
obtained. The existence of the inte- 
gral relation, contrariwise, has sel- 
dom been demonstrated. Thus, when 
both factors are considered in the 
body of available evidence, it appears 
that unequivocal support of the 
neural quantum theory is, for the 
most part, lacking. In addition, the 
validity of judgments obtained under 
the experimental conditions of the 
quantal method has been seriously 


questioned. A 
The present review of literature on 
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the quantum theory suggests the 
need for future research along two 
major lines: (a) the development of 
a more satisfactory technique for 
statistically testing the goodness of 
fit of the quantal and phi-gamma 
hypotheses to a set of experimental 
data, and (b) the determination of 
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the specific conditions under which 
rectilinear psychometric functions 
may be obtained in order to eD 
the validity and universality o 
quantal notions. Until such teeearoh 
is carried out, the issue of the neura 
quantum theory of sensory discrimi- 
nation cannot be fully resolved. 
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N TINA 
NIQUES FOR COMPUTING SHIF [ 
ON te OF ABSOLUTE JUDGMENT! 


KURT SALZINGER 


Columbia University? 


The method of absolute judgment 
(single stimuli) has a long history. It 
started originally as a psychophysi- 
cal method, i.e., it was applied to 
physical stimuli. Since then, how- 
ever, it has come to be more widely 
applied, i.e., to stimuli which cannot 
easily be ordered on a physical con- 
tinuum. 

McGarvey describes the method 
of single stimuli in the following way: 
“The observer is simply presented 
with the members of a group of stim- 
uli one at a time and asked to render 
a judgment upon each by assigning 
it to one of a specified set of cate- 
gories” (7, p. 9). This assignment of 
stimuli to categories is sometimes 
referred to as a “naming response.” 

Investigators using the method of 
absolute judgment have referred to 
the observer's behavior as a form 
tion of a frame of reference (see H 
son, 4, for example) in accordance 
with which he responds to each of the 
stimuli which he must judge. Experi- 
menters were also interested in dis- 
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covering how to modify the frame of 
reference of an observer. This modifi- 
cation of the frame of reference nt 
known as shift and has been broil 
about by, among others, such bie! 
bles as anchor stimuli (stimuli mi 
originally included among the es 
presented to S) and social stimuli ( 
judgments rendered by another ae 
It may be defined as a eee 
change in the categories to which ra 
observer assigns the members © tic 
group of stimuli or as a systema a 
change in the stimulus values a 
which he gives particular naming 
sponses. a 
Along with the study of the i 
rameters affecting absolute judgme yt 
have come a number of dineren 
techniques for the statistical tear 
ment of the data to arrive at a wy 
ure of shift. In this paper, an ike 
tempt will be made to review nt 
techniques used up to the pR 
time for computing the amount ost 
shift as well as to present two P 
techniques, i 
Perhaps the most commonly ae 
technique for measuring shift is ian 
upon the use of ratings. In Pe 
method numbers are applied to jos 
categories either during the expe! 
mental situation or afterwards ic 
these numbers are then treated të 
an equal ratio scale. To evala 
whether a shift in judgment has ta s 
place from one condition to anotar 
means, differences between mean 
variances, and critical ratios of be 
ratings are calculated. The techniq" 
of ratings has been used by, aman 
others, Helson (4) for judgment 


” 
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weights as well as in the field of vi- 
sion. Both Heintz (3) and Brown (1) 
apparently felt the need for justify- 
ing the use of this method. The 
former did so by finding fault with 
other methods of data treatment (in 
which incidently he was justified), 
and the latter justified the procedure 
by appealing to the fact that Likert 
(5) found high positive correlations 
between scores based upon frequency 
counts of judgments and arbitrary 
ratings applied to the attitude items 
of a questionnaire. Brown ignores 
the fact that Likert used judgments 
of different stimuli, i.e., Brown used 
weights while Likert used verbal 
stimuli. Furthermore, the fact re- 
Mains clear that this method of treat- 
ing data consists of the application 
of numbers to events (judgments in 
this case) without specifying the 
Operations in the experimental situa- 
tion that would be equivalent to the 
Operations involved when the num- 
| are combined statistically. It 
“ ere suggested that this method 
annot be used since the operations 
performed with the numbers cannot 
ie performed with the events (judg- 
— being quantified. In other 

s, no evidence is available to 
show that a rating of “4” is two times 
as great as one of “2,” etc. 

Likert (5) uses ratings indirectly. 
He bases the values he assigns to 
Judgments on the following: he as- 
Sumes that the judgments are nor- 
mally distributed; then he deter- 
mines the value of each category by 
peter tag the proportion of Ss giv- 
ng each judgment (or the proportion 

responses given by one S) to a 
po iard score, which in turn is 
Pa on the assumption that the use 
a standard deviations results in 
ma interval scales of the judgment 
auum; This method is some- 

hat long and necessitates a large 


number of judgments or a large num- 
ber of judges to obtain reliable pro- 
portions. In cases where one postu- 
lates individual differences, it be- 
comes necessary to make separate 
calculations of the numerical values 
of each category of judgment for each 
S. Such a procedure makes the al- 
ready long procedure longer. Fur- 
thermore, the assumption of nor- 
mality cannot always be justified or 
met. 

A third method which makes use 
of only the assumption of an ordinal 
scale was utilized by Mausner (6). 
He took median values of each S’s 
judgments in two different situations. 
He then plotted Ss’ median judg- 
ments for groups of 20 trials, show- 
ing graphically what Ss shifted and 
under what conditions. While he did 
not apply any statistical tests to 
these scores (he did to different types 
of scores to be discussed below), this 
type of data lends itself to easy an- 
alysis by means of nonparametric 
tests like the median test, described 
in Mood (8). The point might be 
made here that the median is not a 
very discriminating measure espe- 
cially, for example, if only three judg- 
ment categories are used. It becomes 
more valuable with an increasingly 
greater number of categories. 

A fourth method of treating judg- 
ment data consists of computing the 
mean stimulus value to which a given 
judgment is applied under different 
conditions, e.g, under the usual con- 
ditions (unanchored) and under an- 
choring conditions. Shift can then be 
evaluated by computing the relevant 
statistics. For example, the unan- 
chored and anchored conditions may 
be compared by testing for the sig- 
nificance of difference between the 
means of the stimuli for the two con- 
ditions. As long as the stimuli being 
measured are physical in nature, con- 
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tinuous, and the application of num- 
bers to the events (stimulus values) 
can be specified in terms of physical 
operations equivalent to the statisti- 
cal manipulations, the problems of 
the first method mentioned can be 
successfully avoided. This method 
was employed by, among others, 
Tresselt and Becker (11) for the 
judgmént of length of lines. The 
first disadvantage of this method ap- 
pears in the way these investigators 
utilized it. They compared the mean 
length of line characterized as me- 
dium under two different conditions, 
leaving out the same data for the 
lines characterized as long or short. 
This was done because the response 
medium was the most frequent one, 
with fewer long and short responses, 
In the usual absolute judgment situ- 
ation where quite often as many as 
nine (see Helson [4]) judgments are 
used by S at least some of the stimu- 
lus values, equivalent to a given re- 
sponse, are indeterminate because 
the response has not been employed 
by S. This situation becomes even 
more extreme in the “shifted” situa- 
tion where the effect is such as to 
cause elimination of some of the 
judgments used in the “preshift”’ 
condition. It becomes obvious that 
this method cannot be applied to all 
Judgments because all are not always 
used, and even when all are ‘eel 
they do not occur with equal fre. 
quency, thus resulting in different de- 
grees of reliability for the estimate 
of different judgments, What Tre: à 
selt and Becker (11) did to get aro d 
4 A und 
this problem was simply to use that 
one response which had the largest 
frequency of stimulus values to esti- 
mate the judgment value, While 
this is a solution of a kind, it suffers 
from the fact that only some of the 
data can be used; furthermore, while 
the amount of data discarded for a 


three-point situation is not very 
large it increases as the number of 
judgment categories increases. 

A fifth method applied to judg- 
ment data is that of graphing the re- 
sults in terms of percentages. Wever 
and Zener (12) plotted the percentage 
of times different categories of judg- 
ment were used by an S against the 
stimulus values to show that the 
method of absolute judgment yields 
the same kind of results as the 
method of comparative judgment. 
Postman and Miller (9) presented 
their data in a similar and perhaps 
more revealing fashion. Placing the 
judgment categories on the abscissa, 
they plotted the cumulative percent- 
age of the occurrence of each cate- 
gory separately for each stimulus 
value; thus they arrived at as many 
curves for each condition as there 
were stimuli presented. This pro- 
cedure was followed for the shift an 
preshift conditions so that these 
could be compared to determine the 
degree of shift. Presentation of the 
distribution of judgments under dif- 
ferent conditions (e.g., unanchore 
and anchored conditions) yields an 
excellent view of the phenomenon of 
shift. When graphing must be done 
for many Ss it becomes unwieldy; if 
groups are to be compared some index 
of relationship between the curves be- 
comes necessary, and finally since 
these curves are plotted in terms © 
percentage a great number of presen- 
tations becomes necessary for relia- 
ble curves, 

A sixth technique of treating the 
method of single stimuli was origi 
nated by Mausner (6). He made 4 
frequency distribution of the judg- 
ments of each S under two different 
conditions of judgment CANS 
judging alone; “T’—two Ss judging 
in the presence of each other). He 
then took the difference in frequency 


COMPUTING SHIFT IN A SCALE OF ABSOLUTE JUDGMENT 397 


of occurrence between the two situa- 
tions (“T”?”—“A”) for each category 
of judgment, e.g., if the judgment 
category “4” occurred 0 times in situ- 
ation “T” and 18 times in situation 
A” the difference for that category 
was —18 (‘‘T—A” score). Then he 
found the midpoint of the judgment 
Categories, e.g., for an S who uses 
judgment categories 3, 4, 5, 6, 7, 8, 
9, 10, 11, 12, 13, 14, the midpoint 
would fall exactly between judgment 
categories 8 and 9. Algebraic sums 
of the T—A scores were then ob- 
tained separately for all the judgment 
categories above X(T — A)above, and 
Separately for all the judgment cate- 
gories below 2(T—A)petow the mid- 
point. These two sums were then 
totaled without respect to sign to 
yield a shift score. 
i This method is based upon the fol- 
owing line of reasoning: The phe- 
nomenon of shift manifests itself in 
pecan in frequency of use of one 
na o > scale of judgment and a cor- 
tet eon decrease in frequency of 
` of the other half of the scale. The 
um of the two subtotals, |2(T 
Zeol +|2(T—A)betow|, is then 
Ssumed to reflect the amount of shift. 
Posi method of data treatment as- 
à es rank order of the judgment 
ategories but in using frequencies is 
E from the objections raised against 
e rating method. 
se deusner (6) derived still another 
a which he named the score of 
= ection of shift. In this method, he 
aga the number of plus and 
f nus signs of the T—A scores, re- 
erred to above, separately for the 
Judgment categories above and be- 
Ow the midpoint. He took the dif- 
‘rence in frequency between the 
parue and negative T—A scores 
Parately above and below the mid- 
he eg., if S uses 12 judgment 
egories (there are 6 above and 


6 below the midpoint) and there 
are 5 negative T—A scores and 1 
positive T—A score below the mid- 
point, this will result in a difference 
score of 4 negative T—A scores; if 
there are 6 positive and no negative 
T—A scores above the midpoint, 
then we will obtain a difference score 
of 6 positive T—A scores. Remem- 
bering that the phenomenon of shift 
manifests itself in an increase in fre- 
quency of use of one half of a scale of 
judgment and a corresponding de- 
crease in frequency of use of the other 
half of the scale, we must add posi- 
tive T—A differences above the mid- 
point to negative T—A differences 
below (negative ones above to posi- 
tive ones below the midpoint). If 
there is a preponderance of positive 
T—A differences above the midpoint 
and/or a preponderance of negative 
T—A differences below the midpoint, 
then the direction of shift may be 
characterized as upward. This was 
true in the example given above since 
the 6 positive T — A differences above 
the midpoint must be added to the 4 
negative T—A differences below the 
midpoint to result in a direction of 
shift score of +10 (where-+indicates 
an upward shift and—indicates a 
downward shift). 

It must be noted here that this 
method was designed by Mausner 
because his degree-of-shift score does 
not indicate the direction of shift. 
Usually such a score is not necessary. 
In addition, when the number of 
judgment categories is small the 
amount of discrimination possible 
between Ss is small. It must be noted 
that since both methods rely ulti- 
mately upon a counting procedure, 
they have the advantage of not being 
open to attack from the point of view 
of the inequality of the distances be- 


tween the categories. 
The eighth and ninth techniques 
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of treating absolute judgment data 
were derived by the author for a 
weight-judgment technique applied 
to schizophrenics and normals (10). 
The first of these two techniques 
made use of the frequencies while the 
second made use of the physical scale 
of stimuli (weights in this case). 

The frequency method consisted, 
first of all, of tabulating the total 
number of judgments according to 
the categories: very heavy, heavy, 
medium, light, and very light for the 
unanchored and for the anchored 
conditions. If the anchor makes any 
difference in the judgments, the fre- 
quency of certain categories should 
decrease and that of others should in- 
crease. A heavy anchor would tend 
to decrease the frequencies of the 
heavier judgments and increase the 
frequencies of the lighter judgments, 
and vice versa for the light anchor, 
A comparison of the frequency dis- 
tributions by jud 
was made betwee 


um discrep- 
two cumula- 


e designated 
category-shift score, z al the 


An example of the 
lation of the cate 
given below: 

a. Tabulation of the frequency of 
judgments in each condition ( 
chored, heavy anch 
separately for each 


manner of calcu- 
gory-shift score is 


unan- 

or, light anchor) 

S, e.g., subject xe 
VL L M H vy 

NA 5 7 g 6 3 

HA 8 10 6 1 0 


where VL=very light, L=light, M 


=medium, H=heavy, VH=very 
heavy; NA=unanchored condition, 
HA=heavy anchor condition. 

b. Cumulative frequency distribu- 
tion from VL to VH of the frequencies 
in each category for all eco 
following the above example t i 
table under a would be transforme 
into the table below: 

VL L M H WH 

NA 5 12 16 22 25 

HA 8 18 24 25 25 


where the entry in each cell now rep- 
resents the frequency of judgments o, 
the judgment category to which E 
cell refers plus all the frequencies 5 
all the judgments lighter than the on 
under consideration. e ie: 
c. Subtract the cumulative j 
quencies of the appropriate NA o 
the HA conditions (the LA—lig™ 
anchor condition—from the ake 
Priate NA conditions) and use 5 
largest difference as the shift score. 
Following the above example: 


VL L M H WH 
HA 8 18 24 25 28 
NA 5S 42 i 22 2 
D 3 6 (8) 3 


where D =difference between the > 
and NA conditions and the num xi- 
in parentheses represents the or 
mum difference between the | 15. 
cumulative frequency distributions: 
This difference is the category-S i 
Score. If an investigator so deni 
he can evaluate the statistical sEm 2 
cance of the shift separately for me 
S. Goodman (2) provides a table for 
this purpose. If interested in eat 
Paring groups one can use the ag 
mum difference between cumulatie 
distributions (the category-shift sC0' e 
as a score for each S. These sco" 

which are frequencies can inen h- 
manipulated statistically. This ner 
nique like all methods making "$? 
of frequency can be manipulated st# 
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tistically without fear of dealing with 
scales of unequal intervals as in the 
rating scale method. It has the ad- 
vantages of being simple in calcula- 
tion and of providing the investigator 
with an immediate estimate of sta- 
tistical significance. 

The ninth method of computing 
the shift score for each S used the dif- 
ference between the stimulus (weight) 
to which a particular judgment was 
assigned and the one to which it 
should have been assigned (i.e., the 
Correct weight) according to prior 
verbal instructions given to S. These 
i a were then summed sepa- 
nad for judgments assigned to 
fon heavier and weights lighter 
ie. = one to which they should 
ae een assigned. The difference 
= cen the two differences resulted 

a separate score for the anchored 
Si unanchored conditions; the dif- 
: ence between „the anchored and 

nanchored condition scores in turn 
eagle to the shift score which will 
Shei ignated as the stimulus-shift 
ae example of the manner of calcu- 
‘tion of the stimulus-shift score is 
Siven below: 
a. The responses (judgments of) 
= an of the weights were tabulated 

Shown below, separately for each 


A, HA, and E iti 
subject X; A condition, e.g., 


to 


m uM 
a a M H W 
250 n 
300 i$ 
350 rf 
#00 1 2 2 
HA 
VL L M H W 
200 m 
250 3i 
300 z 
350 - 


400 4 1 
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b. Inspection of the tally tables 
for the unanchored condition (NA) 
and heavy anchor condition (HA) 
shows all the “correct” judgments 
along the diagonal, that is, all the re- 
sponses that have been attributed to 
the stimuli to which Ss were in- 
structed to attribute them. Above 
the diagonals are responses that have 
been attributed to stimuli lighter than 
the ones to which Ss were previously 
verbally instructed to attribute them, 
while below the diagonal are the re- 
sponses that have been attributed to 
stimuli heavier than the ones to which 
Ss were previously instructed to at- 
tribute them. Thus it was possible 
to obtain two scores for each condi- 
tion, namely the number of responses 
attributed to stimuli heavier and the 
number attributed to stimuli lighter 
than the stimuli to which they should 
have been attributed according to 
previous instructions. To get a more 
exact measure of discrepancy be- 
tween judged and actual weight, the 
difference in grams between the ac- 
tual and the judged weight was ob- 
tained. 

For example, looking at the entry 
in the NA table above, the entry in 
Cell 200-L gave rise to a discrep- 
ancy score of 50 gms., since the 
judgment light was attributed to a 
weight 50 gms. lighter than the one 
to which it should have been at- 
tributed according to previous in- 
structions; the same holds for the 
entries in the Cells 250-M and 350- 
VH. By adding these three discrep- 
ancy scores, a total discrepancy score 
“lighter” of 150 gms. is obtained. 

Below the diagonal, discrepancy 
scores can be obtained in an analo- 
gous manner. Cell 250-VL gives rise 
to a discrepancy score of 50 gms. be- 
cause the response VL was attributed 
to a weight 50 gms. heavier than the 
one to which it should have been at- 
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tributed according to previous in- 
structions. In Cell 300-L there are 
three responses that have been at- 
tributed to the same stimulus that is 
50 gms. heavier than the one to which 
they should have been attributed 
according to previous instructions, 
thus giving rise to a discrepancy score 
of 150 gms. The Cell 400-M shows a 
discrepancy of 100 gms. because the 
response M was attributed to a 
weight 100 gms. heavier than the 
stimulus to which it should have been 
attributed according to Previous in- 
structions. Finally, Cell 400-H 
shows a discrepancy score of 100 gms. 
because there are two responses that 
have been attributed to the same 
stimulus that is 50 gms. heavier than 
the one to which they should have 
been attributed according to previ- 
ous instructions, By adding the 
above four discrepancy scores a total 
discrepancy score “heavier” of 400 
gms. is obtained, 

By means of the sa 
two total discrepancy scores can be 
obtained from the HA table. The 
discrepancy Score “lighter” js 50 


gms. while the discrepancy score 
heavier” is 1300 gms. 


c. Subtract the tot 
Score “lighter” 
crepancy score 


me procedure 


al discrepancy 
from the total dis- 
un “heavier” for each 
condition, thus obtaining an estimate 
of the bias or net tendency to make 
errors in the direction of attributing 
judgments to weights heavier or 
lighter than the ones to which they 
should be attributed according to 


instructions. In the example given 
the net bias score for condition NA 
is 400 gms. —150 gms. =250 gms. 
while the net bias score for condition 
HA is 1300 gms.—59 gms. = 1250 
gms. 


d. Finally to obtain a shift score 
subtract the net score of NA from 
that of HA (of LA from NA). In 
this case the subtraction of 250 gms. 
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from 1250 gms. yields a stimulus shift 
score of 1000 gms. for subject X. 
This last score made use of the 
physical scale underlying the judg- 
ment scale. It is based on obtaining 
the difference in the stimulus dimen- 
sion under consideration, that is, be- 
tween the stimulus being judged at 
any given time and the stimulus to 
which the judgment correctly ber 
longs. This shift score takes into eA 
count not only the direction of shi 
and the frequency with which various 
categories are used but also considers 
the degree of shift, i.e., how maD 
gms. difference there is between the 
stimulus being judged and the ai 
lus which S thinks he is judging. m 
Since both the category-shift Bon 
and the stimulus-shift score M 
computed on the same set of da A 
it was possible to compare mo 
Table 1 provides us with the ran J 
order correlations between the a 
types of shift scores, computed on, 
normals and 16 schizophrenics duir 
different experimental sessions n 
due to different anchors. Since afi 
coefficients are high either score To 
be substituted for the other. F urt oa 
inspection of the scores makes pia 
however, that thestimulus-shift “a 
show greater discrimination than a 
Category-shift scores. Amount 
discrimination between Ss can 
roughly measured in terms of 
number of tied scores for Ss- 
total of such tied scores for 
Weight-shift score over all Ss 2” 


the 
The 
the 
d 


be 
* A measure of kinesthetic sense e 
established by adding the total disc ea 


Scores “lighter” (above the diagonal) d in 
“heavier” (below the diagonal) obtaine this 
computing the shift measure; the smaller us, 
total is, the better the kinesthetic sense- 


the 
this score is really an error score. Usin shift 
example given for calculation of the 
measure, a kinesthetic score of 150 gms- on” 


gms. =550 gms. is obtained for the his gms: 
dition and a score of 50 gms. 1507 tion 
=1350 gms. is obtained for the HA con 
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TABLE 1 
RANK-ORDER CORRELATION COEFFICIENTS 
BETWEEN Two METHODS OF MEASURING 
SHIFT FOR 16 EQUATED NORMALS AND 
PATIENTS as A FuNcTION oF HEAVY 
AND LIGHT ANCHOR CONDITIONS 1N 
Two SUCCESSIVE SESSIONS 
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conditions was 49 while that for the 
category-shift score was 107. 

In conclusion, it can be stated here 
that while some of the criticisms (like 
those made against the direct use of 
ratings in statistical manipulations 
based on assumptions of equal inter- 


Session Condition Normals Patients vals) would advise against any use 
1 Heavy anchor .91** _o5** of the method of computing a shift 
Light anchor .96** “93** score, most of the criticisms are of 
r ies o m the nature of specifying under what 
ioe ne EA conditions a particular method might 
: EY or might not show itself up to ad- 

** Significant beyond the .01 level. vantage. 
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At our present stage of ignorance 
about how genes determine behavior, 
we might well concentrate on experi- 
mental studies of lower organisms. 
Their reactions may be thought of as 
the emergent behavior which has de- 
veloped through evolution into the 
complex behaviors of higher organ- 
isms. Knowledge gained from such 
studies may provide conceptual mod- 
els leading to an understanding of 
how hereditary and stimulus compo- 
nents interact in determining higher 
forms of behavior. 

For this purpose the use of lower 
organisms offers distinct advantages. 
There is a brief time span between 
generations, permitting E to perform 
in a short time period the various 
crossings essential to fundamental 
genetic studies. Each generation pro- 
duces abundant Progeny, enabling 
E to recover the extreme behavior 
types required in selective breeding 
experiments. And further, the genet- 
ics of their morphology is better un- 
derstood than is that of higher forms, 
The fruit fly, Drosophila, has all of 
these advantages, 

First, however, reliable techniques 
for measuring individual differences 
(hereafter referred to as IDs) in be- 
havior must be developed. Reliabil- 
ity coefficients must be calculated, 
and they must be high. The problem 
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reduces to the question: How can we 
observe the behavior of large num- 
bers of very small Ss and at the same 
time reliably measure the perform- 
ance of each S? 

‘This Paper presents a method 
which accomplishes both these O 
jectives. We call it the metho a 
“mass screening with reliable in g 
vidual measurement.” As an sae 
tion of the method, we will show > 
in the mass observation of a lan 
lar behavior of Drosophila, reliability 
coefficients of about .9 can be § 
cured in an experimental test perio 
of four minutes. During this time 
sample observations of 15 sec. eae? 
were made. Each individual was ° 
served as a member of a grouP “A 
other flies. The method shows co 
Drosophila IDs can be measured ve 
reliably as human IDs. Indeed, W? 
know of no experiment on men co 
ing 15 brief observations that yie 
a reliability as high as .9. anne 

Genetics has up to the prese "0 m 
cerned itself with physical chara 
istics rather than with behavior. i 
reliability of individual measurem? 
is not so obviously important in ise 
study of morphological character is 
tics; usually the characteristic A 
either present or absent, or pres? 
in only a small number of forms, ay 
its presence or absence is immediat, 
obvious, (e.g, eye color, nore - 
wing, bar eyes, etc.). Individual her 
ferences in behavior, on the ot : 
hand, are not so easily recognize 
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such recognition 
methods. 

There are at least three reasons 
why we need reliable measurement of 
such IDs: 

UL Reliable phenotypic differentia- 
tion is needed for selective breeding 
for homozygous lines. Both the pur- 
ity of different strains and the rapid- 
ity of selection are limited by our ca- 
Pacity to discriminate between indi- 
viduals, since, as the errors of meas- 
Poit decrease, the probability 
3 reases that individuals with the 

ame score will be genetically similar. 

2. The study of learning also re- 
pra reliable individual measure- 
en ecause of the relation between 
ba rength of the unconditioned re- 
pa and conditioning.? (Obvi- 
the or those individuals in whom 
ent emote response has zero 
We ee conditioning is impossible.) 
iri ieve that the study of learning 
ie „reliable knowledge of the 

T Ha ution of IDs in the population 

en aped. Much effort has 

coe pent in demonstrating the in- 
fe e of environment on behavior. 
jen patenta however, that environ- 
fe ri: influence must be an influence 
= bins ap ang and therefore the laws 
ee influence must differ as the 

Isat influenced differs. 
ene _ individual measure- 
ir essential for answering three 

se S about the generality of any 

oe ail (a) Temporal generality; 
eats ee does a given disposition to 
eee endure and to what extent 
sana e rank ordering of individuals 
Tene on this period? (b) Stimulus 
Sti Talization; over what range of 

muli can the response be evoked 


requires special 


2 . 

beet 1s made of conditioned response 

is pees for convenience of exposition. It 
intended to represent a theoretical 


Statement 2 
abo 
Process, ut the nature of the learning 
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and how well is the rank ordering of 
individuals maintained over that 
range? (c) Behavior generality; to 
what extent do other behaviors pre- 
serve the rank ordering of individ- 
uals? 

Efficient methods of observation 
are also a desideratum for studying 
small organisms. It is a theorem in 
sampling theory that the detection of 
extreme cases, a necessity in genetic 
selection experiments, requires the 
observation of large numbers of Ss 
since the probability of finding these 
extreme cases is a direct function of 
the sample size. Rapid observation 
permits the examination of large 
numbers of Ss and thus increases the 
sampling stability essential to the 
generality of the findings. Further- 
more, replication of experiments can 
be undertaken without excessive 
labor. 

The next section of this paper pre- 
sents a method for reliably measuring 
IDs in behavior by means of mass 
screening, a procedure which achieves 
the objective of reliably classifying 
every individuals behavior without 
handling or observing each small organ- 
ism individually. The method is com- 
pletely general and easily applicable 
to the study of any behavior, both 
unconditioned and conditioned. 

This objective is illustrated by the 
results of an experiment that em- 
ployed the mass screening technique 
in the study of the geotropic reactions 
of Drosophila melanogaster. A series 
of 15 successive mass screenings, for 
example, produced 16 test tubes, each 
containing a different geotropic class 
of Drosophila. The flies in the tubes 
0 to 15 represent different degrees of 
the negative geotropism. That is, the 
flies are differentiated on this final 
composite 16-point scale based on 15 
prior mass screenings in which the 
individuals were not separately han- 
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dled. The reliability coefficient of 
this final scale score is determinable 
and in principle, it can be increased 
to any desired value by further mass 
screenings. 


EXPERIMENTAL DESIGN AND 
ANALYTIC PROCEDURES 


The method consists of cumulat- 
ing a total composite score X,, for 
each organism in any behavior, X, 
where: 


X= Mi +Xot+X3t+ +++ +X,. 


Xi, X2,-++, Xn represent scores 
earned by it in n comparable sample 
mass screenings. Setting up such a 
total score is the essence of psycho- 
logical test theory. Most of the for- 
mulae used in this paper are standard 
in psychological test theory. A sim- 
ple summary of them can be found in 
J. P. Guilford’s Psychometric Methods, 
Chaps 13, 14, 15 (1). Guilford’s ra- 
tionale of the formulae, however, is 
based on the factorial truth-error 
doctrine. In another paper one of the 
authors develops them with fewer 
assumptions (3). Our procedure 
adapts these principles to the prob- 
lem of calculating reliability coeffici- 
ents for the scores of individuals who 
are only observed as members of a 
large group. 

The main ste 
are as follows: 

1. Conceptualize the behavior 
property, X, that is to be scaled, and 
operationally define it with sufficient 
specification to indicate the general 
conditions under which it may be ob- 
served. 

2. Devise a standard test sample 
procedure for obtaining a unit meas- 
ure of IDs in X, one which has the 
advantage of permitting observation 
of a large group of Ss at one time 
while locating the total, N, of indi- 
viduals in subgroup classes scored 
0, 1, 2, +++, k in magnitudes of X. 


ps of the procedure 
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3. Take a randomly bred sample 
of the Ss and mass screen them 
through # replications of the stand- 
ard procedure. At the end of every 
replication, score each subgroup by 
its cumulative total score, X,, then 
combine subgroups with the same 
X. score and proceed with the next 
replication. 

4. Calculate the reliability coeffi- 
cient, 71, of each successive Xx sar 
decide on the value of n which res 
yield a reliability of sufficiently hig 
magnitude, then examine the ae 
of the distribution of the X, scores © 
the individuals. P 

5. If the original method results S 
a low reliability or an excessive'y 
skewed distribution of final compe, 
ite scores, alter the standard ree") 
take a second random sample and A 
peat the general procedure. ~~ 
such experiments may be required x 
fore an adequate method of observ’ 
tion is discovered. r . 

The details of the steps of this gen 
eral procedure will be developed tt 
illustrated by an experiment Tps 
ducted by one of the authors on 


i . . S0- 
in the geotropic reaction of Dro 
phila. 


‘tion 
1. Conceptualization and Definit 
of the Behavior 


The behavior chosen was the He 
conditioned disposition to go phis 
direction opposite to gravity. iy 
negative geotropism is operation? e 
defined asan upward movement 0 itu- 
fly whenever it is placed in any her 
ation Permitting travel upward, ° uce 
external stimuli which might 1” ed. 
vertical movement being contro 


re 

2. Standard Test Sample nate i 
a V 

The test situation consists of up- 


test tubes, a lower one standing | ed 
right in a rack, the other inv è 
over the mouth of the lower 


C1 
i ce} 
Since the flies are also photot® P 
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the light source was placed at right 
angles to the vertical. A group of 
flies are placed in the lower tube, 
shaken to the bottom, and then al- 
lowed to ascend. At the end of an 
arbitrary “cutting point” time of 15 
ree a card is inserted between the 

Ower and the upper tubes. The up- 
Per tube is scored and labeled “1,” 
and lower tube “0.” 

Thus the standard sample observa- 
a n this case is like a dichotomous 
Men em, the top tube scored “pass” 

d the lower one “fail.” A cutoff 
ia 15 sec. was found empirically 
hate group of flies into two 
mage equal pass and fail 
diet? a division which avoids 
cotouate in the distribution of final 

Tt ite X, scores. 
wi be emphasized that di- 
iima scoring is zot a necessary 
ae 10n of the method. The stand- 
a cedure could have been de- 
Wang provide more classes. The 
fae M break was chosen for experi- 

ntal convenience. 


1S standa 
though andard 


definitior 


ti 


Cc 


i test procedure, 
satisfying the operational 
elicit a a! geotropism, might not 
to Brain i y a systematic reaction 
ion, Per y. Since the test tube situa- 
it may ae only movement upward 
differengs: eats if there is an activity 
i ar lal among the Ss, the flies 
iv be upwardly mobile may be 
N flies, Only additional ex- 
resolve T which control activity can 
term « e matter. Thus, we use the 
opera o Pism here only in an 
the moal sense, recognizing that 
might S observed in this situation 
later be shown to be signifi- 


cant y infl I “4s 
Do. i uenced by additional com- 


ve 


3, : 
Choice of an Unselected Sample 


Since the range and reliability of 
Ken 1S partly a function of the hetero- 
ety of the Ss, a stock of unse- 


lected Drosophila with a history of 
random mating was chosen. 


4. Mass Screening 


A random sample of 106 flies was 
screened and scored by the following 
procedure. 

a. First composite score, Xa =X.. 
The results of the first observation 
are shown in Fig. 1, which reproduces 
part of the score sheet actually used. 
Under X, and fı it can be seen that 
54 flies ascended to the upper tube, 
earned a ‘‘pass’’ and thus received a 
score of X;=1. There are 52 flies that 
remained in the lower tube, earned a 
“fail” and received a score of X,=0. 
The scores, X;,, of this trial take the 
values of 1 and 0. 

b. Second composite score, Xn =X n 
+X:. The 54 flies with X, =1 were 
put through the standard procedure 
a second time for Trial 2. The 46 flies 
that ascended earn a tube score, Xa 
=1, and a composite score Xa =2; 
the 8 remaining down have X: =0 and 
X,,=1, as shown. In similar fashion 
the flies with X;,=0 divide into 22 
earning X,.=1, X,,=1 and 30 earning 
X2=0, Xa =0. 

c. Third composite score, Xn =X n 
+X, The standard procedure is re- 
peated for each of the three Xa 
classes resulting from Trial 2. 

Note, even though there are four 
X, tubes of flies at the end of Trial 2, 
there are only three X classes. The 
two subgroups with 8 and 22 flies 
have been combined in one tube be- 
cause both received the same score, 
X,,=1, i.e., the same composite score 
is the cumulative sum of all previous 
scores irrespective of the order in 
which the individual “passes” and 
“fails” were obtained. 

d. Additional composite scores, X u, 
Xj, +++. The procedure is contin- 
ued by taking further sample obser- 
vations; at the end of each one, sub- 
groups having the same X: score are 
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combined for the next 
Figure 1 shows the resul 
cally up through Mri 

The reason for the “experimental 
convenience” of dichotomous classes 
in the standard procedure should now 
be apparent; with more than two 
classes the number of subgroups be- 
comes unmanageable. 


observation, 
ts schemati- 


5. Analysis 


a. The distribution of X, scores. 
One of the objectives of experimental 


> 
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Mass SCREENING Scorr Form 


behavior genetics is reliable an 
entiation between individuals , dif- 
subsequent genetic validation ree - 
ferences by means of selective jti 

ing. Since, for a given behavior pil- 
assumed that there is a range O° ition 
ity and that the Ss in a popula ol- 
are distributed over the range, ! dto 
lows that any methods which = 4 
pile up the final scores in a few we 

treme categories should be esche he 
in favor of others which distribute 


5 
MDE oe ual 
scores more widely. The indiv! 
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oe > under observation 
ee me ane hon it is 
Beery on 4h 4 ifferentiate them 
ete 1e behavioral scale. Fail- 
ot ay — the discovery 
oa ypic differences that 
ai read can usually control the form 
fh ance ee of total X, scores. 
athe Mustnadive experiment this 
lection ote accomplished through 
nce of the time interval in which 
the Sponse can be performed, i.e., 
z oo p, of “passes” and 
Smiter <p vary as a function of the 
ihe, Te time allowed in the test 
Periments per from several ex- 
b>5, oe may be shown that when 
tively ee n distribution is nega- 
Positive] z and when p<.5, Xi is 
tesserae skewed. Either type of 
pile iis a undesirable because cases 
where, org extreme categories 
reeding he purposes of selective 
are ee ay e finest differentiations 
R ol ae is illustrated in Table 
hs cae e frequency distribution 

i fen Posite score X+, from Fig. 
tries a in the first row of en- 
this sam “ie cutoff was used for 
earning ra The mean proportion 
Buiccessin Score of X,;=1 on the ten 
E Pe ewes tests is p=.5. 
kurtic witi ution is seen to be platy- 
of the a 2 no appreciable piling up 
This is T in the extreme categories. 
e result of the approxi- 


Mate] 
Y 50-50 cut on each trial. 
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The effects of extreme cuts are 
shown in the other rows of Table 1. 
For the group with an 8-sec. cutoff 
in the standard test the proportion 
getting into the upper tube is j=.16, 
with the result that the composite 
Xna scores are very positively skewed 
with a pile up of flies in the 0 cate- 
gory. The opposite extreme cut of 27 
sec. gives a p=.66, with a pile up at 
the high Xn, scores. 

b. Reliability of X: scores. It is 
important that the composite X: 
score be reliable if Æ is to use the dif- 
ferentiations between individuals as 
the basis for further experimental 
work on selective breeding, condi- 
tioning, or the investigation of the 
generality of behavior X. The relia- 
bility coefficient, rr, cannot be com- 
puted by the split-half method in the 
mass screening method because com- 
bining into a single group all Ss with 
the same composite X, score loses the 
specific sample score history of each 
individual. The coefficient can be 
estimated accurately, however, from 
the variances of the composite Xi 
score and of the individual test sam- 
ple scores, as follows (3, Formula 12): 


n zA 
=——| 1——— Js 1] 
gi n—1 ( Ve 
where: 


n=number of standard test 
samples or replications. 
Z V:=sum of the variances (07) of 
the 2 test samples. 


TABLE 1 


DISTRIBUTION OF INDIVIDUALS IN 


COMPOSITE X: SCORE 


j (Entries are frequencies) 


x 

7 Cutoff a 
a d a a i s Goren gee 

-50 

-16 15 sec, i sa 7 t 8u g 1g dae C e 
-66 Sec | 54 13 8 8 7 a 33.8 Jie 
27 sec. 0 9 3 3 4 4 4 16 24 12 13| %2 
> 
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V,=variance of the final com- 
posite X, scores, i.e., a7. 


When, as in the present case, the 
standard procedure gives a dichoto- 
mous cut, the variance, Vi, of any 
particular sample observation is: 


Vi=0q, [2] 


where: 


p = proportion of individuals above 
the cut in all subgroups 
=mean score when, as in the ex- 
ample, those above the cut are 
scored 1, those below 0. 
q=1-p. 

The values of the reliability coeffi- 
cients and of other constants for sev- 
eral Drosophila experiments are given 
in the third rows of Table 2. The first 
group is the one presented in Fig. 1, 
in which 15 sample observations were 
finally taken under conditions be- 
lieved to produce optimum differenti- 
ation between individuals, It will be 
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“adjustment” trials the pare 
progressively increased to .87 for t - 
final composite score based on 15, 
sample observations. 

The E naturally asks: are the suc- 
cessive sample observations strictly 
comparable measures of the property 
X, here the negative geotropic San 
tion? The additional constants ° 
Table 2 give insight into this ques 
tion. bile 

If the individuals systematice : 
improve or deteriorate in perform- 
ance the mean score, pi, and the par 
ance, V;= pq, of successive obeema 
tions will both change. In the Ar % 
and second rows of Table 2 we ae 
that in our example p; and therefor 
V; both remain relatively Ca ar 

If the individuals become ang 
more reliably differentiated or rag i. 
as screening proceeds, then the r ia 
ability coefficient will not increase E 
cording to the “Spearman-Brow 
law” of increased reliability with t 


indi 1 addition of comparable sample ‘int 
noted that, beginning with the fourth servations. Evidence on this pO! 
column of entries, after the first few can be secured in two ways. 
TABLE 2 
RELIABILITY COEFFICIENTS AND OTHER CONSTANTS IN THE 
DROSOPHILA Grorropic EXPERIMENTS 
SAMPLE OBSERVATION, Xi 
Group n 27-3 45 6 7 g§ 9 1 1 2 13 14 18 
Mpeg le 
Bi -64 .38 .50 .53 47 .53 .55 os 
“93-47 .53 155 63 .55 .48 .49 .48 - 
ree Vi -23 24 25 .25 25 25 125 23 (25 125 125 .25 .25 ‘37 
= re “62 .49 .60 .70 .75 178 (80 82 82 183 83 .84 -86 “44 
n 23, 60 49 42 39 39 30 "39 43 42 44 47 44 30 
Fa 145 24 .28 31 .33 133 .33 .33 31 .31 .30 .29 -30 - 
pi -15 .20 11 18 .16 118 17 
F i x f yi .18 .21 
8 sec. Vi «13 .16 .10 15 „14 .15 14 15 17 
N=104 ru 66 .63 .68 68 72 "75 (77 “99 “gi 
ne 20 34 35 47 44 47 44 44 44 
Fij 49 .36 .35 29 -30 .29 30 -30 .30 
Ps -83 67 T1 LT 67 71 
27 sec. V: 14.22 21 L21 -22 121 “ga “3 35 
ads ru | —.04 15 .51 .59 69 |72 “79 76 m 
n 298 76 67 44 44 57 “57 “54 
7u | —.02 .06 .20 .22 .27 .27 35 25 36 


GENETICS OF LOWER ORGANISMS 


The first is to discover whether the 
mean correlation, 7;;, between sample 
observations entering into the com- 
posite, X;, changes for successive X, 
scores. From the familiar Spearman- 
Brown approximation (3, Formula 
17), we note that the reliability coeffi- 
cient, ri, for any composite X; 
based on 2 samples is: 


nT ij 2 
ip [3] 
1+ (n—1)Fi 
whence, solving for Fij: 
N Fiilis 
potii 


n—=(n=1)rints 


, The successive values of 7,;; are 
Siven in Table 2, fifth rows. We note 
that after the first few trials 7;; pla- 
teaus around .30. 

_ The other way is for E to set a de- 
Sired reliability for the final com- 
Posite, and solve for the value of in 
Equation 3 that will achieve this de- 
sired reliability. Suppose E desires a 
z lability of .95. Call this Ry. Set 

u into Equation 3 and solve for n: 


Ritiaee 
pes met Fij) [5] 
Fyll — Ru) 

inne, values of n for Ru=.95 are given 
atal e fourth rows of Table 2. In gen- 
hi they remain around 45 trials. 
of > Ending has the practical value 
trials orming E how many sample 
liabi are necessary to achieve the re- 
ility he desires. If 2 turns out to 
a large, a design having more 
as ses per trial might be considered 
$ a means of reducing the number of 

rials required. 
tase hen the individual test sample 
ep T are not available, as is the 
malte hen groups are screened on the 
iple-unit discrimination maze 
ae the reliability coefficient can be 
Puted directly from the final dis- 
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tribution of X, scores by means of 
the Total Score formula (3, Formula 


Sia 
n M,.-—M?/n 
(ep i Me 

l Vi ), (6) 


Lt | oe 


where M, equals the mean of the final 
composite X, scores. 

c. Domain validity coefficient of the 
composite score, X,. The reliability 
coefficient, ra, though necessary in 
the above formulations, is not the 
best statement of the reliability of 
the composite X.. A more meaningful 
index is the correlation between the 
X, scores and that on an indefinitely 
large number of screenings, namely 
Xto. Though the “true score,” Xis is 
not available, the correlation ry, can 
nevertheless be estimated as follows 
(3, Formula 21): 


Tuo=V Tu. [7] 


Thus, in our case our X, based on 
fifteen screenings would correlate ri, 
= 4/.867 =.93 with a perfectly relia- 
ble measure based on many such 
screenings. This coefficient also has 
the following added meaning: If we 
had the true score of each fly based 
on many sets of 15 screenings, the 
ratio of the standard deviation of 
these true scores to that of the ob- 
served X, score would be .93. In 
short, the distribution of true scores 
would look much like that actually 
observed. 

d. Individual variance (“errors of 
measurement’). In order to conduct 
experiments on selective breeding, 
conditioning, or generality it is neces- 
sary to get a practical estimate of 
the amount of difference in X, scores 
among individuals that is undeter- 
mined, i.e., not assignable to known 
sources of variation. This estimate is 
the individual variance, V, (3, For- 
mula 23a), where: 
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Vo=Vill—ru)- [8] 


In our example for X,,;, V.=4.40, 
hence the individual standard devia- 
tion is: 


oo=4.40\/1— .867=1.6. 


The necessity of a high reliability 
can be seen in the above formula: as 
the reliability approaches unity the 
amount of variation attributable to 
individual variance tends to vanish. 

Nonuniformity of individual vari- 
ance. The individual variation, how- 
ever, is most likely not constant over 
the final distribution: (a), an extreme 
score can vary in only one direction, 
towards the mean: (b), the indi- 
viduals receiving extreme scores have 
shown perfectly consistent perform- 
ance throughout, that is, either they 
have always scored a zero or they 
have always scored one. Hence, it 
might be expected that the individual 
variation, as estimated by a retest, 
should be much smaller at the ex- 
tremes than in the middle of the dis- 
tribution. 

Empirical check. To assess this pos- 
sibility a retest or validati 
ment may be performed. 
lustration, the Ss receiving extreme 
X., scores of 15 and 14 were combined 
and put through n’=10 additional 
trials; also those receiving middle 
X, scores of 7 and 8 were put through 
a retest of 10 trials. For the extreme 
categories o;,-7=4.00, while for the 
middle categories T? =5.82, the lat- 
ter being significantly larger than the 
predicted variance for the middle 


on experi- 
In our il- 
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categories. It is evident that the as- 
sumption of uniformity of individual 
variance over the whole X, scale is 
doubtful. 


Limits oF SELECTIVE BREEDING 


How many generations is it neces- 
sary or practical to continue a selec- 
tive breeding program, i.e., what are 
the criteria for stopping? The indi; 
vidual standard deviation, c=) sd 
provides an answer to this question: 
it is useless to attempt further ae 
tion in any line beyond the poin 
where its ¢,=0; at that point the 
method of observation no longer reli- 
ably differentiates individuals, 1.6-7 
neither selection nor the evaluation 
of the results of selection are ed 
longer possible. In our case, nO aa 
ther selective breeding would be a 
tempted in any line whose o: W 
much below 1.6, 


SUMMARY 


Fast breeding, prolific, small ya 
ganisms are pre-eminently suited t 
studies in the field of behavior gens 
ics. Their value as experimental, 
is further enhanced by the methot a 
mass screening that succeeds in CO! È 
bining the objective of reliable in 
vidual measurement with that E 
mass observation. Hence, it is 10° 
Possible to achieve the experimenta 
desiderata of efficiency, reliability: 
and brevity in the field of behav!© 
genetics. The method is illustr a 
by experiments on the geotropic " 
sponses of Drosophila. 
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EYSENCK’'S TREATMENT OF THE PERSONALITY 
OF COMMUNISTS 


RICHARD CHRISTIE 
Columbia University! 


t A current problem in personality 
3 ory is that of the relationship be- 
veen personality variables and sus- 
ceptibility to deviant political ideolo- 
i A considerable amount of 
Siduale has been collected on indi- 
aiia on the right-wing of the body 
Bon: : Fs current source of frustra- 
e or merican psychologists, how- 
Gia, is the paucity of relevant data 
me embers of the extreme left. For 
= reasons, recent years have 
ih marked shrinkage in the size 
a sate Population _and an increase 
a oe ing difficulties. It is therefore 
Givecied T to find that attention is 
aeie toward the personality char- 
Othe. a of communists, among 
ae K H.J. Eysenck of Maudsley 
The P al, London, in his recent book 
sychology of Politics (8). 
tous’ os Eysenck's major conten- 
fee oe hat communists and fascists 
and « uar in being “tough-minded 
hat Ee This is a highly 
careful e hypothesis. However, a 
Chen examination of his data indi- 
attrib that if his measures of these 
conci utes were valid, quite different 
clusions would be drawn. 
€ present critique shall be re- 


1 . 
i ee was written while the author 
Suds: ellow at the Center for Advanced 
R A in the Behavioral Sciences. An earlier 
sug was substantially modified as a result of 
gestions made by colleagues. Ramon J 


1 s . Togs, 
lag € assisted the author in statistical calcu- 
ations, 


stricted to methodological points and 
their implications for a more ade- 
quate understanding of the relation- 
ships between certain aspects of per- 
sonality and political ideology. 
Eysenck’s interpretation of his data 
on communists and fascists is crucial 
for his theoretical schema but a 
thorough evaluation of the latter 
would add unduly to the length of 
this paper.? Detailed documentation 
in support of the present criticism 
will be presented and only those inac- 
curacies and inconsistencies of Ey- 
senck’s which are pertinent to our 
specific topic shall be cited. 


ARE COMMUNISTS AND FASCISTS 
SIMILAR IN BEING TOUGH- 
MINDED”? 


References to the finding that 
communists and fascists differ from 
less politically deviant samples in 
being more “tough-minded” are scat- 
tered throughout The Psychology of 
Politics. This is considered demon- 
strated by scores on a scale designed 
to measure “tough-tender-minded- 
ness.” Examination of the evidence 
indicates: (a) that no confident gen- 
eralizations are justified upoh the 
basis of the sampling procedures used 
in selecting samples from the parent 
populations; (b) that the scale does 
not measure “tough-mindedness,” 

2A review of The Psychology of Politics by 
the writer may be found in The American 
Journal of Psychology, 1955, 68, 702-704. 
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at least among communists; and (c) 
that Eysenck engages in misleading 
manipulations of communist and 
fascist test scores in contrasting 
them to various “neutral groups.” 

Purported evidence for tough- 
mindedness comes from two studies. 
The first was conducted by Eysenck; 
the second was an unpublished doc- 
toral dissertation done at the Uni- 
versity of London by Thelma Coulter 
which is cited by Eysenck. They shall 
be examined separately. 


THE EysENcK STUDY 


In The Psychology of Politics Ey- 
senck states, “When we average the 
average scores of the groups on the 
T factor, i.e., without paying atten- 
tion to the fact that the number of 
cases is different between the groups, 
we find that the Liberals are the most 
tender-minded with a score of 7.7; 
that the Socialists and Conservatives 
follow next, with a score of 7.0; and 
the combined Communist-Fascist 
group has much the most tough- 
minded score (S.5)"3 (8, pp. 137— 
138). In evaluating this conclusion 
the sampling, measuring, and analy- 


sis procedures shall be treated in 
order. 


Sampling 


The largest group of subjects were 
middle-class adherents of the Con- 
servative, Liberal, and Socialist par- 
ties. A smaller sample of working- 
class members of the same Parties was 
also obtained. In addition, com- 
munist subjects were recruited from 
two branches of the Communist 
party and a few fascists were obtained 
in an unspecified manner. 

The basic middle-class 


sample, 
This was composed of 250 


middle- 


3 Socialists refer to members of or voters 
for the British Labor Party. 
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class members of each of the three 
major British political parties. The 
clearest statement of the sampling 
procedure may be found in an ate 
published in 1947 (5, pp. 53-5 )- 
Students in university classes, Uni- 
versity extension classes, and in 
W. E. A. classes were required to 
give from five to fifteen queenie 
naires each to friends and acquaint- 
ances and have them answered. ‘7 
this fashion 317 usable questionnaires 
were collected from individuals oe 
tifying themselves as supporters i 
the Conservative party, 256 Liberals, 
and 409 Socialists. Three samples ben 
250 each were drawn from each o 
Parties so that they were roug a 
equated for age, sex, and nage i 
The respondents came from an ur 
background (5, p. 54). No 
The working-class sample. tö 
specific information is given a is 
how this sample was selected. * a 
inferred that they were also a? i 
questionnaires by members of el 
senck’s classes since he notes cent 
“The method of selection adopted <2 
been explained in some detail in i 
first paper of this series...” (6» Le 
200). Since the paper referred to hae 
devoted exclusively to middle-c E 
respondents it cannot be determin n- 
whether the working-class quest!© Bs 
naires were obtained at the s 
time as were those of the midd 
class respondents or at a later ae 
The number of protocols is mY nd 
smaller being 65 for the Conserve 
tives, 27 for the Liberals, and 45 1). 
the Socialists (6, Table I, P- ee 
These respondents were also ur 5 
but there were no controls on 28% 
Sex, or education (6, p. 200). ait 
The communist sample. The P a 
cedure utilized in sampling co ts 
munists differed since these subjenie 
were recruited directly through © ne 
Party organization. ‘The total ' 
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formation available is: “Contact was 
made with Party Branches through a 
member of the Communist Party 
who undertook to collect the ques- 
tionnaire replies. He used two differ- 
ent branches, one primarily working- 
class, the other primarily middle- 
class. Relatively few refusals were 
encountered among those approached, 
In spite of a feeling that this type of 
work was ‘futile’ ” (6, p. 200). Fifty 
Protocols were collected from middle- 
class communists and 96 from work- 
Ing-class communists. (6, Table I, 
p. 201), 

The fascist sample. The only in- 
formation available as to the recruit- 
ment of these subjects is the single 
Sentence, “Only seven middle-class 
Pee could be found who were fol- 
Owers of Mosley and may properly 
be called ‘fascists’ ” (6, p. 206). 

e Comparisons of samples. In the 
a ler article dealing with the mid- 
oe respondents Eysenck argued 
what he termed analytic sam- 
Pling, i.e., he was interested in com- 
rang attitudes of members of the 
Bes, a parties when other varia- 
abe a ecting (or possibly affecting) 
s udes were held constant—age, 
(5 » education, and place of residence 
ae Pp. 53-58). This is a legitimate 
age ae and there is no quarrel with 
E ince no significant differences 

ere found to be related to the first 
bo of these in the basic middle- 
ion sample, controls on them were 
= Pped for the working-class, com- 

unist, and fascist samples. Such a 
Procedure is based upon an implicit 
forte that if there are no rela- 
in ships between certain variables 
one sample, there will be none in 

Sample of quite a different nature. 
ar an assumption may be valid 
“ie it needs to be demonstrated be- 
ee it has been shown that different 

ationships between attitudinal var- 


iables hold in middle- and working- 
class samples (4, pp. 171-172; 10, 
pp. 58-61). In other words, differ- 
ences found by Eysenck between 
middle- and working-class adherents 
of the major parties might well be a 
result of uncontrolled factors and not 
simply a result of different class mem- 
bership. 

These same criticisms might be ex- 
pected to apply with even greater 
force to comparisons between mem- 
bers of major British political parties 
and those belonging to the commu- 
nist or fascist parties. Almond’s sam- 
ple of middle-class communist de- 
fectors indicates strongly that they 
were a deviant group aside from 
their political idiosyncrasies (2). 
There is another problem which may 
lead to bias in the comparisons be- 
tween the communists and others. 
The communists were recruited 
through an organization which im- 
plies active political interest: adher- 
ents of the major political parties 
may or may not have been politically 
active since they classified themselves 
as to ‘‘... the group in which you 
would include yourself” when pre- 
sented with a list of parties (5, Table 
I, Q. 47, p. 78). It is well known 
that people who belong to groups 
differ in many respects from those 
who do not (10, pp. 61-63). Now it 
may be argued that communists are 
by definition active group members 
and thus differ from the majority of 
the rest of the population. It would 
nevertheless be extremely important 
to know whether they are less differ- 
ent from those who are politically 
active in major parties than from 
those who merely list themselves in 
a particular party when asked to do 
so. In short, to what extent are dif- 
ferences in attitudes between com- 
munists and major party members 
traceable to ideology per se and to 
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what extent to other factors relating 
to political activity? x 
Eysenck’s failure fully to consider 
the implications of biases in sampling 
can best be illustrated by his discus- 
sion of the representativeness of his 
basic middle-class sample. He quite 
rightly notes that the initial results 
should not be generalized to rural or 
working-class Populations. He subse- 
quently admits that all British urban 
middle-class people did not have an 
equal probability of being drawn 
since his students were not suffi- 
ciently widely acquainted. He then 
says,‘ . . . it seems unlikely that this 
principle would affect very many 
middle-class people, or that it would 
be correlated in any systematic way 
with the type of attitude which is 
being studied. Careful scrutiny of 
the papers written by the students, 
and verbal questioning after discus- 
sion of sampling procedures, did not 
reveal any suggestion that our sam- 
ples were seriously biased; while this 
conclusion cannot, of course, be ac- 
cepted as definite Proof, it is perhaps 
near enough the truth not to affect 
our conclusions in a very serious man- 
ner” (5, pp. 57-58). 
Itisthe presentconten 
serious biases were pri 
senck’s basic middle-cla: 
result of the samplin 
Comparisons of his sa 
mates of the parent 
ulation indicate this very clearly. 
First, consider the age distribu- 
tion of Eysenck’s basic middle-class 
sample. He dichotomized sample 
members as older and younger, the 
cutting point being thirty years of 
age (5, p. 57). The actual range and 
distribution of ages of respondents 
is not given (although curiously 
enough the range of ages of the stu- 
dents collecting the data is given!) 
(5, p. 54). Presumably, the respond- 
ents must have been’ of or_almost of 


tion thatvery 
esent in Ey- 
SS sample asa 
g procedures, 
mples with esti- 
middle-class pop- 


voting age or the results would be 
almost meaningless. According E 
present calculations approximately 
20 per cent of the adult British popu 
lation in 1951 was in the 20-29 year 
age range with 80 per cent fa 
over thirty years of age (based on ¢ 
Table 7, p. 8). Yet 64.9 per Sei 
(487 out of 750) of Eysenck E 
spondents were under 30 years n e 
(calculations based on 5, TBE H 
p. 78). What has happened is R ta 
clear. Eysenck’s students tende 4 
choose as respondents friends nat 
acquaintances who were near ee 
own age level and the entire distri js 
tion was skewed toward the young 
age groups. 

ihe eee of this, the fact tiar 
Eysenck found only a slight but ae 
significant tendency for the youn 
members of his sample to be ™ ale 
radical is not at all puzzling. <a 
Says in reference to this, eee 
“The failure of the old in Be 
sample, to be more Conservati 5 
than the young, is perhaps also 65, 
Opposition to expectation . - ae 
D. 68). It is an ae statistical 
Principle that a truncated are 
tion obscures relationships pent ble 
it is Suggested, is the most pro! ofl 
reason for the lack of differentiat! g 
between Eysenck’s so very yond 
“young” group and his not so ol@ 
“old” group. 

A ee R may be teveled 
against the educational bias in t i 
sample. Those “who have had a E 
versity education” totaled 57.7 E 
cent (computed from 5, Table a 
78)—the precise definition of Y y 
versity education, whether men 
attendance or graduation is not SP ble 
fied. There are no figures availa! ty 
which give the number of univers 
graduates or of those with some oa 
versity attendance in Great Beye 

It is possible, however, to Pi 


4 ich 
together bits of information wh! 
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indicate that Eysenck’s sample is 
extremely highly educated as con- 
trasted to the British middle-class. 
British census data give a detailed 
breakdown of university attendance. 
In 1950-51 there were 102,012 full 
and part-time university students in 
Great Britain who were taking 
courses (14, Table 108, p. 90) and 
17,337 first degrees were given (com- 
puted from 14, Table 111, p. 92). 
Comparable figures for the United 
States in 1950 are 2,659,021 students 
attending colleges and universities 
its, Table 140, p. 125) and 432,058 
i degrees (16, Table 139, p. 124). 
ee every British university student 
fos re were 26.07 American students; 
every British first degree there 
Were 24.92 American first degrees. 
À hen corrections for the total popu- 
ea of the two countries are made 
A 1s apparent that the ratio of uni- 
€rsity students in England and the 
a. States is approximately one 
me ignt or nine. It is impossible to 
€rmine the effects of foreign stu- 
ea upon these comparisons al- 
et it is believed not to affect the 
Mae E comparisons markedly; less 
AR 0 per cent of British full-time 
ie eo in 1950-51 were from out- 
f the United Kingdom (computed 
rom 16, Table 108, p. 90). 
ere nately for present purposes, 
S census data in the United 
Ty s (since 1940) contain estimates 
ae o the amount of education re- 
ved. Thus in 1950, 5,784,570 
Pete claimed four or more years 
i college (based upon a 34 per cent 
aaa (15, Table A, p. SB-12). 
Us 5.9 per cent of the 97,403,307 
7 gare over 21 years of age in 
es nines college graduation. If 
wag y half the American adults 
e to be considered middle-class 
Sout portion of college graduates 
be around 12 per cent among 
em (since education is a measure of 


class).4 A recent estimate by the 
Census Department indicates that 
15.4 per cent of the adult American 
population has had some college edu- 
cation (or roughly 30 per cent of the 
middle-class) (cited in 13, p. 238). 

Since age distributions and the rel- 
ative rate of growths of institutions 
of higher learning differ slightly in 
the United States and Great Britain 
present estimates are rough. Applica- 
tion of the ratios previously deter- 
mined to the preceding figures would 
suggest that the proportion of adults 
in the British middle-class (similarly 
assuming roughly half the population 
as middle-class) having a university 
degree would not be much above 2 
per cent and that of those having 
some university education above 5 
per cent.’ These are rough estimates 


‘This is a crude estimate. Centers (3, 
Table 8, p. 57) broke down a 1945 Gallup 
cross-sectional sample of white males of 21 
years of age or over in the United States into 
the following groupings for urban residents: 
all business, professional, and white collar 
(N=430); all urban manual (V=414). The 
rural categorization was: farm owners and 
managers (N= 153); farm tenants and laborers 
(N=69). If the initial categories are con- 
sidered middle-class, slightly over half the 
sample would be so classified. If nonwhites 
had been included the proportion of middle- 
class would presumably decrease slightly. In 
view of sampling errors (3, Table 1, p. 38) an 
estimate of 50 per cent seems a reasonable 

roximation. 
ore The question of the comparability of the 
proportions of the population in similar classes 
in Great Britain and the United States is a 
puzzling one since different criteria are appar- 
ently used by the Gallup organizations in the 
two countries. Centers found 43 per cent of 
his sample classified themselves as middle- 
class (3, Table 18, p. 77). Eysenck presents 
data on British class identification (8, Table 
III, p. 18). Present calculations indicate that 
41.7 per cent identified themselves as either 
middle or lower-middle class (in obtaining this 
figure the computed total of 8,890 was used as 


a denominator since Eysenck’s addition is 


erroneous). Since subjective class identifica- 


tion is substantially correlated with external 
ratings of class membership, these figures sug- 
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it is believed that they are not 
orks in error. They are so far be- 
low the 57.7 per cent of Eysenck’s 
sample that it is clear that his sample 
was completely unrepresentative of 
the British middle-classes. N 

It would be tedious to continue 
demonstration of other aspects of the 
nonrepresentative nature of Eysenck s 
middle-class sample. Such biases 
might well be expected to have a 
major effect on the attitudes elicited 
from subjects. Eysenck notes that 
the correlation between social class 
and political attitudes in Great Brit- 
ain is .67 (8, p. 19), “ .. . that social 
class estimates are determined al- 
most completely by social status.” 
(8, p. 20) and that, “.. . education 
. +. is of course so closely related to 
status that the results are almost a 
foregone conclusion” (8, p. 20). 

In view of the fact that Eysenck’s 
basic middle-class sample is markedly 
unrepresentative of the British mid- 
dle-classes, it would be highly danger- 
ous to project their attitudes to ob- 
tain an estimate of the Parent popu- 
lations. Yet Eysenck suggests this 
Possibility both in an earlier article 
(5, p. 57) and in The Psychology of 
Politics (8, p. 127). 


Of crucial importance for the pres- 


ent discussion, however, is the fact 
that the comparisons of scale scores 
among groups belonging to differ- 
ent social classes and Political parties 
embody not only these differences 
but many other uncontrolled biases 
as well. No confidence can be placed 
in the generality of the differences 
in scores found among the groups 
studied with the exception of com- 
parisons within the middle-class 


gest that the proportion of middle-class indi- 
viduals in the two countries is not too dis- 
similar. However, the wording of the alterna- 
tives in the two surveys differed and differ- 
ences in the meaning of class labels in the 
countries is an unknown factor. 


where age, sex, and education were 
roughly controlled. 


The Measurement of ‘Tough-Mind- 
edness” 


Rokeach and Hanley (12) pay 
discussed Eysenck’s T Shaan, Sea iene 
edness”) factor.6 A ereta min Lan 
the portions of Eysenck's wae 
which they refer clearly diea 
that the mean scores reported eA 
Eysenck are in disagreement w! Je 
the data which he reported. ee “8 
from such computational errors, a 
are other aspects of the T scale w n 
are relevant in any attempt to he 
cover the significance of scores on enf 
T scale made by samples of differ 

olitical affiliation. ; 

. Three biasing factors which be 
empirically related to the sc nie 
made on the T scale among the a 
ples examined by Eysenck have at- 
uncovered. These are: (a) the tre 

ment of the “no-answer” neg ge 
(b) the asymmetric nature O TAA 
scale, and (c) the different interpr 
tion of the items among varionie 
ples. Each of these biasing effe 

shall be considered separately. ate 

Treatment of the “no-answer” © the 
gory. In most attitude scales eh 
treatment of neutral categories is the 
Plicitly or implicitly based upon ho 
assumption that a respondent Y ot 
does not have an attitude, can ee 
make up his mind as to the ae er 
the specific question asked, or ot =i 
wise does not agree or disagree, iS the 
an extremist in terms of whateY at 
scale presumably measures. Li that 
type scales are constructed so an 
such a reply (or lack of one) to : 
item is scored intermediately 
tween acceptance and rejection. d by 

he scoring system employe eG? 
Eysenck is based upon other (unsP 


*ced a 
* The present critique has been modi m 

a result of Rokeach and Hanley’s analy: 

minimize duplication. 
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ified) assumptions. Nine of the 
fourteen items (see Fig. 1) are 
t tough-minded”’ and five are “‘ten- 
der-minded.” Respondents are al- 
lowed to choose among the follow- 
ing alternatives: “strongly approve” 
Ct +), “approve on the whole” (+), 
can’t decide for or against, or if you 
think that the question is inade- 
quately worded” (zero), ‘‘disapprove 
on the whole” (—), or “strongly dis- 
approve” (— —) (8, p. 122). If the 
Particular item being scored is one of 
the five tender-minded ones, agree- 
ment, whether ‘‘strong” or “on the 
whole,” is given one point. All forms 
of neutrality and disagreement are 
arbitrarily scored as zero on the T 
ate If the item in question happens 
o be one of the nine tough-minded 
Ones, disagreement of any variety is 
given a weight of one. Any form of 
agreement and “zero” responses are 
Sven no weight. 
M e a scoring system has interest- 
P nn properties. A respondent 
a Ened with everything, what- 
A the content, would automati- 
y have a score of nine. Disgruntle- 
we in this case, leads to a score 
Hien is more tender-minded (by vir- 
r a the scoring system) than would 
2 oy in the case of one who ac- 
=n ed all items. _ Acceptance how- 
‘a strong, of all items would result 
maximum total score of five. If, 
OWever, a respondent, for reasons 
iy or others could not agree or 
\sagree with amy item his final score 
Would be exactly zero. 

The logic of a scale which classifies 
tenon who disagrees with every 
a m of a battery of items as high in 
tender-mindedness and one who 

can’t decide for or against, etc.” as 
cing at the extreme pole of tough- 
mindedness is difficult to understand. 
" f more direct interest is the extent 
: which Eysenck’s use of the no re- 
Ponse category affects the T-scale 


comparisons of his samples. If there 
were no systematic differences in re- 
sponse sets among the members of 
the various samples the possibility 
of bias would be largely vitiated. 
Since, however, there are more 
tough- than tender-minded! items the 
scoring of the zero category operates 
to make samples characterized by a 
high proportion of indeterminant 
answers more tough-minded. Un- 
fortunately, Eysenck does not report 
the frequency of these responses 
among members of the various sam- 
ples. He does, however, roughly re- 
port the frequency of extreme re- 
sponses (++ and ——). “... we 
find that only 35 per cent of the 
socialist, liberal and conservative re- 
sponses have been marked in this 
fashion, but 54 per cent and 51 per 
cent respectively of the middle-class 
and working-class communist re- 
sponses” (6, p. 206). Among the 
seven fascists, “It is of interest to 
note that these subjects were the 
most emphatic of all, their propor- 
tion of ++ and —— scores being 
67 per cent” (6, p- 207). 

The above figures indicate that 
since the communist and fascist sam- 
ples checked more extreme responses 
than did members of the three major 
parties, they also had a lower fre- 
quency of zero responses. Such an 
inference is in agreement with the 
known relationship between ex- 
tremity and intensity of attitude. As 
Eysenck notes in a discussion of pre- 
vious research, “..- when the dif- 
ferent groups who had taken part in 
the study were compared there was 
a marked tendency for the more ex- 
treme groups to be more certain of 
their opinions. This characteristic 
we shall find again in our discussion 
of Communist and Fascist ideologies” 

x 120). 
Ler ve basis of available data it 
can be inferred that the conserva- 
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tives, liberals, and socialists sampled 
had a higher frequency of zero re- 
sponses than did the samples of com- 
munists and fascists. The arbitrary 
system of scoring which treated zero 
responses as tough-minded thus in- 
troduced a bias of unknown extent 
in the direction of making the mem- 
bers of the three major parties more 
tough-minded, relatively speaking, 
than those of the two deviant parties. 
The asymmetric nature of the scale. 

In the present analysis of the T scale 
considerable importance is attached 
to the fact that the items also meas- 
ure radicalism and conservatism. Of 
the fourteen items, nine have a 
higher saturation on R (radicalism 
and conservatism) than on T. No 
item is clearly an independent meas- 
ure of T. The original T scale was 
based upon a factor analysis of 40 
items responded to by the basic mid- 
dle-class sample previously discussed. 
A total of 23 items had saturations of 
+.20 or greater upon the T factor. 
It is impossible upon the basis of an 
E- of the saturations to de- 


termine why some items were in- 
cluded in the 


items with hi 
not (8, Table 


T scale when other 
gher saturations were 
not | XX, p. 129), This is 
in direct conflict with what Eysenck 
says in The Psychology of Politics: 
Mess we must obviously construct 
measuring instruments for R and T 
respectively. Two scales were ac- 
cordingly constructed by combining 
the items most highly correlated with 
the two factors respectively, each scale 
consisting of 14 items” (8, p. 133 
italics mine). ; 
The reliability of the T scale was 
-64 (.80 when corrected by the Spear- 
man-Brown formula) and .81 on the 
R scale (.90 corrected) (5, p. 65). The 
lower reliability of the T scale is not 
surprising since the variance ac- 
counted for by each is 8 and 18 per 
cent respectively (5, p. 59), some of 
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the items in the R scale have practi- 
cally no saturation on T, and the 
lowest saturation of any R-scale item 
on R is .45 in contrast to the low of 
.20 of T-scale items on T (8, Table 
XX, p. 129). 

The crucial point in an interpreta- 
tion of Eysenck’s results is that the T 
scale is a somewhat better measure of 
R than T. The mean loading of T- 
scale items on T is .38, on R, .48 (cal- 
culated from data in 8, Table XX, P- 
129). 

If we consider, as Eysenck does, 
communists to be both tough-minde 
and radical, fascists to be tough- 
minded and conservative, conserva- 
tives to be tender-minded and con- 
servative, and socialists to be tender- 
minded and radical, certain conse 
quences follow in the determination 
of the scores of members of thesi 
parties. An examination of Fig. z 
indicates that there are qaet 
numbers of items in the four Te 
rants. When this fact is combine 
with Eysenck'sscoringsystem,strang® 
results may be expected. Assume: 
that a hypothetical consistent conr 
munist and his fascist counterpat 
were answering the items in the 
scale, hypothetical perfection being 
defined as being well indoctrinate 
in their respective ideologies (radica 
and conservative), and that be i 
were equally tough-minded. Bo $ 
would receive a total of exactly = 
Points for rejecting the five tender 
minded items in the T scale. Er 
hypothetically perfect commun’ ” 
would receive jive points for rejec a 
ing the five items in the conservative 
tough-minded quadrant and no po a 
or accepting (however strongly) Leal 
four items in the radical toug 
minded quadrant. The consiste” 
fascist, on the other hand, should re 
ject the four items in the radic 
tough-minded quadrant (four pom 
but would receive no points for 2 
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"Tough - minded” 


60. 
©29 oa 
09023 OT 
5 
O's 
Radical Conservative 


70 -,.60 -,50 -40 -30 720 -10 20 .30 .40 ,50 .60 ,70 


©28 


O16 


-70 
"Tender- minded" 


Fic. 
Isa za PLOT oF THE 14 T-SCALE ITEMS ON THE T AND R AXIS (TAKEN FROM 5, P. 81). THERE 
The RELATION OF —.12 BETWEEN R AND T (5, P. 66). 
actual items are (5, Table 1, pp. 76-77): 


aie “Tough-minded" Quadrant 
9. Men and women have the right to find out whethe 
9 Ses (e.g., by companionate marriage). 
23. el ce Nae is old-fashioned, and shou 
15. ERE laws should be altered to make divorce easier. 
- The laws against abortion should be abolished. 


6 z 
roe “Tough-minded" Quadrant 
1 ap objectors are traitors to their country, 
3 oe people are innately inferior to white people. 
30. T ar is inherent in human nature. 
5. he Japanese are by nature a cruel people. ; be, 
‘a . Persons with serious hereditary defects and diseases should be compulsorily sterilized. 
cues “Tender-minded"” Quadrant 
A Only by going back to religion can civilization f 
bs » It is right and proper that religious education 1n 
adical “‘Tender-minded" Quadrant . 
10. It is wrong that men should be permitted greater sexual freedon than women by society, 
a In the interests of peace, we must give up part of our national sovereignty- 
. The death penalty is barbaric, and should be abolished. 


r they are sexually suited before 


ild cease to govern our behavior. 


and should be treated accordingly. 


tion hope to survive. 
schools should be compulsory- 
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cepting the five conservative tough- 
minded items. "S 

By virtue of an asymmetric dis- 
tribution of items combined with 
Eysenck’s singular scoring system, 
a hypothetically consistent fascist 
is automatically made more tough- 
minded by one point than a hypo- 
thetically consistent communist. The 
confusion inherent in such a scoring 
system becomes the more puzzling 
since Eysenck persists in lumping 
fascists and communists together as 
being tough-minded. 

A similar analysis indicates that 
a hypothetically consistent socialist 
should be more tender-minded by 
one point than a hypothetically con- 
sistent conservative. Since many of 
the differences in mean scores on the 
T scale are less than a point apart, the 
preceding indication of the impor- 
tance of differential weighting aris- 
ing from the scoring system takes 
on crucial importance. The field of 
attitude measurement is bedeviled 
with enough problems without in- 
cluding scales with built-in biases 
based upon unspecified assumptions. 

Interpretation of the i In the 


f " tems. 
preceding section we have dealt with 


T-SCALE ITE 


the responses which a hypothetical 
communist might make to the T 
scale. The underlying assumption 
was that he should reject all uen 
except those which were radica 
and tough-minded since these are the 
characteristics which Eysenck at- 
tributes to members of the Commu- 
nist Party. An examination of Ey- 
senck’s data indicates that the com- 
munists sampled responded in quite 
a different fashion. 

Table 1 shows the mean percent- 
age acceptance of items falling A 
these quadrants by communists = 
members of other parties (with t E 
exception of the small fascist sanr 
for whom data are not given). Bot 
middle- and working-class commu- 
nists show markedly greater accept 
ance of radical tough-minded ee 
and greater rejection of conservativ 
tender-minded items than subjects 
affiliated with other parties. Ta 
results are completely in line Ss 
Eysenck’s analysis and the expecta 
tions of anyone familiar with politi- 
cal attitudes. ‘al 

However, the theoretically crucia 
responses are communist responses 
to T-scale items which fall into the 


MS BY QUADRANT, SocraL CLASS, 
ATION* 


Working-Class 


Cons. 


gee ee Worn Clas 
ib. Comm. Soc. Lib. Cons. 
Tough-minded 


AND POLITICAL AFFILI. 
Middle- 
Item Nos,* a 
Comm. Soc, Li 
Radical 
29, 9,.23,. 15 94 61 42 
Conservative 
13, 1, 3, 39, 5 10 29 39 
Tender-minded 
Conservative 
16, 28 00 34 56 
Radical 
10, 8, 36 84 73 58 


* Taken from (6, Table III, p, 203), 


33 82 52 22 42 
53 18 44 50 67 
66 05 20 70 72 
43 80 49 42 43 
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other two quadrants. Both middle- 
and working-class communists are 
markedly less receptive to the items 
falling in the conservative tough- 
minded quadrant than other sample 
members and more receptive to those 
items falling in the tender-minded 
iden: quadrant. Examination of 
ese figures suggests that the com- 
ie ag sampled are not responding 
K- a tough- or tender-mindedness 
T-scale items but rather to the 
radical or conservative content. Our 
calculations indicate that the middle- 
ad communist sample had a mean 
ps oe of 91 per cent of the seven 
ene e items (four tough- and three 
rae he patos with a radical satura- 
ae : only 7 per cent of the seven 
igs ea tougi and two tender- 
in C ) with a conservative load- 
ae omparable figures for working- 
ts Bae ges are 81 and 14 per 
cere n the other hand, if the re- 
aa es to the nine tough-minded 
pier examined, the mean accept- 
ne . a per cent by both middle- 
relic ing-class communists and 
a acceptance of the five tender- 
nded items is 47 and 50 per cent 
respectively. 
The present interpretation of these 
agurea is simple. The communists 
mpled by Eysenck responded to the 
pea not upon the basis of their 
sire ngs on tough- and tender-mind- 
of ig responded directly in terms 
The om radical-conservative loading. 
pt scale simply does not apply to 
Peo (or at least to those 
on ed). Comparisons of scores 
hi ae communists on a scale on 
a n they did not respond along the 
inuum measured with scores 
To by other samples are meaning- 
; Analysis. An analysis of Eysenck’s 
reatment of results also leads to 
questions of interpretation. Let it be 
assumed (as it assuredly is not) that 


there are no problems in sampling 
or measurement in the data which 
Eysenck reports. Let it further be 
assumed (despite our agreement with 
Rokeach and Hanley’s recomputa- 
tions) that Eysenck’s addition is cor- 
rect. It is still possible to raise ques- 
tions about the manner in which the 
data are treated. 

It has been previously noted that 
the comparisons of various “groups” 
involved an “average of an average 
score.” The reported score of liberals 
on the T scale thus was based upon 
the singular procedure of adding the 
mean (7.9) of 250 middle-class lib- 
erals (as sampled) to that (7.4) of 27 
working-class liberals (as sampled) 
and dividing by two and then round- 
ing up the “average of an average” 
of 7.65 to 7.7 in deriving the tough- 
mindedness score of liberals. 

The results obtained from the “‘av- 
erage of an average” treatment lead 
to even more remarkable results when 
applied to the combined communist- 
fascist samples. The only way in 
which the writer is able to arrive at 
the “average of an average” score re- 
ported by Eysenck is as follows: add 
the mean T-scale score of the 50 mid- 
dle-class communists to that of the 96 
working-class communists and divide 
by two which gives a score of 6.4; 
take the mean score of seven fascists 
and add it to the previous figure, di- 
vide by two, and round down the 
“average of an average” of 5.55 to 
5.5 to obtain the figure given by 
Eysenck.” A 

Various possible comparisons of 
the scores of various samples (taking 


7 An examination of the rounding practice 
followed by Eysenck indicates a systematic 
procedure. Contrary to more customary pro- 
cedures of rounding in a consistent direction, 
to odd or even numbers, etc., Eysenck’s 
roundings in these comparisons are such as 
to maximize the discrepancy between com- 
munist-fascist and other political groupings. 
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middle- and working-class together) 
are given in Table 2. When the mean 
score is used instead of the “average 
of an average” on Eysenck's reported 
sample means the communists are 
less deviant from the other groups. 
If a mean is taken on the T scores 
as recomputed by Rokeach and Han- 
ley they become even less deviant. If 
one wished to weight the middle- and 
working-class means by their rela- 
tive proportion in various political 
parties still different figures would be 
found, 


TABLE 2 


Comparison OF TouGH-MINDEDNESS 
SCORES OF MEMBERS oF VARIOUS 
SAMPLES BY PARTY 


Mean of 

“Average Mean Rokeach 
Party of an Cire and 

Average’’* Hanley’s 

Meanst Meanst 

i 
Liberal 4 7.7 7.85 8.22 
Conservative 7.0 7.46 7.51 
Socialist 7.0 7.73 

A 7.78 
mae S35 6.35 7.33 
Fascist — $ An 


* Taken from (8, p 


. 137-1. 
Computed from 9); 


8, Table XXIII p. 138), 
omputed from (12, 171) x 
Sept era Table 2, p. 171), 


It can only be concluded that 
Eysenck presented his data in such a 
Way as to maximize the differences 
between communists and fascists on 
the one hand and other political par- 
ties on the other. The differences in 
mean T-scale scores of various sam- 
ples are less than the errors that 
might be reasonably expected to oc- 
cur from sampling biases and the 
peculiarities of the scoring system, 
It is impossible to place any reliance 
in the T-scale differences among vari- 
ous samples even if Eysenck’s unu- 
sual arithmetic practices are replaced 
by more conventional techniques. 


THE COULTER STUDY 
Eysenck believes that Coulter’s 
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study represents confirmation of his 
own findings. He states, “These re- 
sults (Coulter's) bear out in every 
detail the results of the previous 
study (Eysenck’s), and we may at- 
cordingly conclude that our na 
hypothesis is strongly supporte 4 
(8, p. 142). This assertion is believe ; 
to be unjustified upon the basis © 
available data.® 
Sampling 

Coulter gave a battery of tests tO 
three samples. All were compose 
of British working-class males (8, P 
142). One was a “neutral” sarih 
of either 86 (8, pp. 142, 202) Php t 
soldiers (8, p. 152). The criteria 10 
selection are not specified by Eysenc* 
although he states that they ‘‘. -- ne 
stituted a fairly random sample (8 
the British working-class males Ae 
p. 142). No information is given “7 
to whether these soldiers were volun 
teers or conscripts. Since military 
samples underrepresent older 2. 
groups and those older men 1n +; 
Army tend to be “Old Army Men 
who are certainly not typical O ` 
working-class population, it is ne 
unlikely that such a group would ed 
roughly approximate a random sa! 
ple of the working-class. ist 

Coulter’s communist and fasc! 
samples were each composed © is 
working-class males. As far as 
known, no reliable estimates 45 


* Discussion of these data is restricted Jy 
what Eysenck reports concerning them 1" nor 
Psychology of Politics. Neither Coulter the 

elvin’s theses which are germane Coul- 
topic have been published. A copy of | n 0 
ter’s thesis was examined after completi© 
this critique. It has not been necessary (8, 
modify present criticisms, Eysenck refer he 
P. 276) to“... Melvin (1954). .-+ 8, P: 
only Melvin listed in the bibliography ( and 
301), is, “Melvin, D. An experimental j- 
statistical study of two primary social Lib» 
tudes. Ph.D. Thesis, Univ. London 1y 
1953.” According to a letter dated Feb- ry: 
1955, from the University of London Librai, 
no such thesis had been filed and no infor 
tion was available concerning it. 


ee 


m See 
_—————————— o. 
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the characteristics of the parent pop- 
ulations being sampled exist. It is 
therefore impossible to determine the 
representativeness of these samples. 


Measurement 


Coulter used a revised set of R and 
scales devised by Melvin. The 
atter started with a pool of 60 items 
and factor analyzed the question- 
es of 650 respondents of unspeci- 
ri origin (8, p. 132). Twenty of the 
sed items used by Eysenck were in- 
ee upon a comparison of 
s , pp. 208-209) and (8, pp. 277-279). 

: these eleven were used as measures 
a both R and T by Eysenck (Items 
39) , mo 15, 16, 23, 28, 29, 36, and 
(lke three as measures of R only 
a ems 12, 26, and 27), three as meas- 
T J T only (Items 5, 10, and 13), 
n „three were not included in the 
Sigs Rand T scales although they 
(I e in Eysenck’s pool of items 

tems 6, 18, and 35). 

_ Melvin added another 40 items 
Tae Eysenck says there were 
AE p. 132). An inspection of these 
~ te: that they are fairly similar 
$ he ones originally used by Ey- 
enck. In the new R and T scales, the 
a was expanded to 16 items and 
i scale underwent drastic revi- 
ion and was expanded to 32 items. 

Of the eleven items measuring both 
a oe T in Eysenck’s scaling system 
io y two were used in the same fash- 
am by Melvin (Items 29 and 35). 
b ne was used as a measure of T alone 
a Melvin (Item 23). The other 
ight did not emerge in Melvin’s 
oF The three original measures 

; n alone are not included in Mel- 
vin’s T scale. Two of the original 
measures of R alone used by Eysenck 
Perform the same function in Mel- 
vin’s revision (Items 21 and 27). The 
Other (Item 26) is used by Melvin to 
measure both R and T. 

___ The scoring system used by Melvin 
is identical to that used in Eysenck’s 


original T scale. Twenty of the 32 
items are in the tough-minded direc- 
tion, twelve in the tender-minded 
direction (8, pp. 276-279). It is im- 
possible to determine from the ma- 
terial Eysenck presents whether the 
asymmetry which served as a source 
of bias in his scale is also present in 
Melvin’s and this question cannot 
be answered due to the unavailability 
of the latter's thesis. It is clear, how- 
ever, that the criticism made previ- 
ously of the bias resulting from dif- 
ferential group response sets favoring 
utilization of the no response or zero 
category applies to Melvin’s revision. 
Eysenck notes that the split-half 
reliabilities of Melvin’s revision of 
the R, T, and E (emphasis) scales 
lie between .85 and .95 in “...a 
relatively unselected group” (8, P. 
277). It is Eysenck’s contention that 
Melvin’s research “.. . showed that 
our original results could be repro- 
duced with an entirely different set 
of items” (8, p. 132). Data are not 
available to evaluate the accuracy of 
this statement but the point is not 
germane to the present argument. If 
the scales are measuring the same 
dimension there is no reason to be- 


lieve that the uncritical application 
of the T scale to communists would 
not be subject to the same bias as 
that demonstrated in Eysenck's work. 
If, on the other hand, they are meas- 
uring something different we are left 


in an even more puzzling situation as 
to what comparative scores on the 
tests mean. 

Analysis. 
tral,” comm 


The means of the “‘neu- 
unist, and fascist groups 
on the T scale are not given by 
Eysenck. However, the distribution 
of scores of the latter two groups is 
given and a point is presented which 
represents the mean score of the 
“neutral” group (8, Fig. 26, P. 141). | 
Our calculations indicate that the 
means for the various groups are as 
follows: “neutral,” 14.2 (interpolated 
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approximation); communists, 11.05; 
fascists, 7.85. The striking point in 
this ordering is that the communists 
fall almost exactly midway between 
the “neutral” and fascist samples be- 
ing 3.15 units from the former and 
3.2 units from the latter. Standard 
deviations computed from the dis- 
tributions indicate that the scores of 
the fascists and communists differ 
significantly (CR of 4.18). n 

What this difference means is, of 
course, completely puzzling since 
there is no reason to suppose that the 
same vitiating circumstances which 
made the earlier comparisons mean- 
ingless do not apply with equal 
cogency. We see no greater reason 
on the basis of the data for lumping 
communists and fascists as different 
from a “neutral” group than for dif- 
ferentiating fascists from “neutrals” 
and communists. 

There is one further matter which 
Eysenck does not touch upon but 
which makes the comparison of 
Coulter’s samples somewhat ques- 
tionable. It was noted that a sample 
of soldiers was not very apt to be 
representative of working-class males, 
Examination of the mean R-scale 
score of this group raises the possi- 
bility that this is a most unusual 
group of soldiers, 

Melvin's R scale is eight-sevenths 
the length of Eysenck’s, 


If we as- 
sume that acceptance of the items 
tends to be about the same on the 


two scales we can compare various 
groups within the two studies. The 
plausibility of such an assumpti 
indicated by the fact that Eysenck’s 
working-class communists had amean 
score of 10.7 on R (8, Table XXIII, 
p. 138). Multiplying this figure by 
eight-sevenths we find a Projected 
mean of 12.23 as the hypothetical 
value of working-class communists on 
Melvin’s revision. The actual mean 
computed is 12.90 on Coulter’s sam- 


on is 


ple (based on 8, Fig. 26, p. 141). 
When we apply the same ee 
to the SD of Eysenck’s sample ai 
our own calculations of the SD o 
Coulter’s data, a test of significance 
indicates that Eysenck's and Coulter S 
communist samples do not dipe 
significantly in radicalism. This 
conclusion rests, of course, upon two 
assumptions—that the units of pene 
urement in Eysenck’s and Melvin 
scales are similar and that the two 
samples are comparable. re 

A similar projection cannot k's 
made for the fascists since Eysenc 
lone seven were middle-class Tat 
Coulter’s were working-class. W ee 
is of pertinence is the projection 
scores of working-class members 3 
the major parties. Using the we 
procedure as with the commit j 
we find that the extension of A 
senck’s R-scale means (8, „Ta a 
XXIII, p. 138) yields projections A 
follows for Melvin’s scale: conin ii 
tives, 3.2; liberals, 4.2; and sogialis 
7.3. If we weight these groups ish 
their representation in the PRS 
working-class population (as thie 
in 7, p. 57) we find a projected nape * 
of 5.8 on the R scale. Yet Coulte pe 
sample made an R-scale score 26 
10.8 (interpolated from 9, Fig. 2° 
p. 141). Woe 

Why a group of soldiers in 7 
British Army should be so strika 
more radical than would be expect 3 
on the basis of Eysenck’s own mn 
is extremely puzzling. It cera 
does not argue for the generality 5. 
any conclusions based upon a paa 
parison of scores made by < o 
groups with their own. The lac the 
internal consistency found in ck 
analysis of data reported by Eyse¢* 
clearly indicates flaws in met CE 
ology. Some of these we can pinPo re 
with reasonable accuracy; others re 
not so easily traceable since the 
sential data are not given. 


- ae 
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ARE COMMUNISTS AND FASCISTS 
SIMILAR IN BEING “AuTHOR- 
ITARIAN”? 


Among the scales given to Coulter's 
samples was the California F scale 
(the particular form used is not speci- 
fied). In discussing this instrument 
Eysenck says, “It was entitled the F 
scale because Adorno et al. considered 
it to be a measure of Fascist poten- 
tial. This interpretation, however, 
as we shall very soon see, is in part 
at least erroneous as we have found 
Communists to make almost as high 
Scores on this scale as Fascists, and 
Consequently we shall in this book 
refer to the F-scale rather as the 
authoritarianism scale” (8, p. 149). 
an Ty pages later he reports that 

-scale scores were: “neutral” 
Fock, 75; communists, 94; and 
ai 159 (8, pp. 152-153). The 

ous fact here is that the com- 
eae do not’... make almost as 
ae Scores on this scale as Fas- 
eral ++" since the difference be- 
is a communist and fascist scores 
oe xtremely large 65 points where- 
phos maken differ from the 
ee ral group by only 19 points. 
hae again, Eysenck arbitrarily 
oi communists and fascists to- 
ether in an attempt to indicate their 
Similarity. 
ac are some singularly curious 
Bien, about the F-scale scores which 
TS nck does not dwell upon. The 
are s means for the three samples 
ee oal group, 2.5; commu- 
sian -13, and fascists, 5.30 (our cal- 
pi ions), The range of possible 
a is from 1.0 to 7.0 with 4.0 
pane the theoretical neutral 
Sia The communists’ mean is 
fea this point indicating a general 
fascier > to reject the items. The 
which have a high acceptance score 
tion represents a striking confirma- 
ik of the validity of the F scale as a 
asure of fascistic attitudes and 


bluntly refutes Eysenck’s contention 
that the F scale does not measure 
fascist potential. 

The “neutral” group which is so 
fascinatingly aberrant again demon- 
strates its uniqueness. The score of 
2.50 is the second lowest score obtained 
in roughly 50 samples with which the 
writer is familiar. What makes this 
fact so interesting is that this was a 
working-class sample and working- 
class samples tend to make higher 
scores on the F scale than comparable 
middle-class samples (the correlation 
between F-scale scores and educa- 
tion, usually part of class definition, 
has been estimated as being between 
—.50 and —.60 for American sam- 
ples) (4, pp. 168-170). The only 
known group making a lower score 
than Coulter’s neutral group con- 
sisted of 26 graduate students at the 
University of California (with an 
average of 6 semesters of graduate 
work) who refused to sign a special 
loyalty oath. This group made a 
score of 1.88 (9, pp. 124-126). (A 
comparison group of signers with 
similar education scored 2.73 which 
is higher than Coulter’s ‘‘neutral” 
group.) 

American college students usually 
score in the 3.0 to 4.0 range on the F 
scale (1, Table 12 (VII), p. 266; 11, 
Table 9, p. 245). If American find- 
ings are applicable to British popula- 
tions we should expect a representa- 
tive sample of British working-class 
males to make even higher scores. 
Such an expectation would seem to 
be in accordance with Eysenck’s own 
data since an examination of the pro- 
portionate acceptance of T-scale 
items by working-class samples as 
constrasted with middle-class sam- 
ples (see Table 1) indicates that 
among sample members of all parties 
the working-class respondents ac- 
cepted more of the items falling into 


the tough-minded conservative quad- 
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rant. These have high similarity to 
the sorts of items that enter into the 
F scale and the correlated E and 
PEC scales. 

It is of obvious importance to have 
comparative data on British samples. 
In so far as the writer knows, there 
is no published material of this sort. 
However, Rokeach has recently been 
in Great Britain and administered 
the F scale as well as other measures. 
The item means on two samples of 
university students were 3.26 (N 
=80) and 3.57 (N =137). These find- 
ings suggest no marked differences 
between American and British uni- 
versity students on the F scale. A 
group of 60 workers at Vauxhall 
Motors made an item mean on the F 
scale of 4.74.9 This higher score by 
working-class men as contrasted with 
a college sample is in accordance with 
American findings and does not allay 
suspicion that there was something 
extremely unusual about Coulter's 

“neutral” working-class sample. 

Eysenck reports that Coulter’s 
sample of communists scored higher 
on the F scale than the politically 
` neutral” group. Available evidence 
indicates that communists tend to 
score lower than members of other 
political parties on the F scale. It 
can be argued that Eysenck’s own 
material Supports the latter point 
of _view. An examination of Table 
1 indicates that they were the least 
accepting of all samples of members 
of various parties when it came to 
the items in the conservative tough- 
minded quadrant which, as has been 
noted, are similar to F-scale items in 
meaning. It has also been argued 
that the selection of items in the T 
scale was apparently somewhat ca- 


? Rokeach, M., Personal communicati 
» M., Persona tion. 
1955, The full implications of Rokeach's 
findings will be developed in his forthcoming 


aan on political and religious dogma- 
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pricious. If we examine the items in 
this quadrant which were not in- 
cluded in the T scale—membership 
being determined from R and T fac- 
tor saturations as given in (8, Table 
XX, p. 129)—we find that a highly 
similar pattern of acceptance occurs. 
The mean percentage acceptance of 
the five T-scale items in this quadrant 
by working-class communists is 18, 
while of the seven non-T-scale items 
—17, 22, 26, 27, 30, 31, and 33—‘it 
is 20. Similar comparisons on the 
most similar group, the socialists, 15 
44 and 55 per cent acceptance for 
items included and not included in 
the T scale (computed from 6, Table 
III, p. 203). , 
Direct comparisons with American 
samples are not available. Although 
a sprinkling of communists was 10- 
cluded in the samples described 19 
The Authoritarian Personality, their 
F-scale scores were not reported. 
However, their scores on the E scale 
were given as well as the correlation 
between the E and F scales in the 
samples in which these communist 
subjects were included. Upon the 
basis of this data it is clear that these 
communist subjects scored extreme y 
low on the F scale (see 4, PP: 130 
133, for a fuller discussion as well as 
the congruence of such a conclusio” 
with earlier work with communist 
Tesponses on Stagner’s measure Q 
fascism). ) 
We are therefore in complete i 
agreement with Eysenck’s conclu- 
sions that the F scale: (a) measur?” 
“authoritarianism” instead of poz 
tential fascism, or (b) that commu” 
nists make higher F-scale scores than 
samples of members of less extre™ 
political groups. The only suppor” 
for his position comes from the = 
credibly low F-scale score purP%, 
edly made by Coulter’s “penta 
group. By using this score as 2 at 
and ignoring the implications of 
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ao data and the research of others 
€ arrives at a conclusion which we 
believe to be untenable. 


FURTHER CONSIDERATIONS 


š It is clear that Eysenck’s commu- 
nist samples are neither ‘‘tough- 
minded” nor “authoritarian” when 
the data produced as evidence by 
Eysenck are carefully examined. Our 
analysis clearly indicates that com- 
munists respond to T-scale items 
simply in terms of the radical-con- 
servative loading and not in tough- 
or tender-minded fashion. This is a 
graphic illustration of the danger in- 
herent in assuming, as Eysenck ap- 
parently did, that a scale which pre- 
sumably measures one thing in a 
Babee ’ population (in the statisti- 

al sense) measures the same thing in 
a radically different population. 
aoe po may be clarified by con- 
Asis wears est geen e e 
ees hat n en should be per- 
= greater sexual freedom than 

omen by society” may be inter- 
paed in alternate ways. It may be 
at the “wrong” is based upon the 

> ay that neither men nor women 
D ` 1 be allowed sexual freedom; 
Violation of this standard is therefore 
o a This is apparently the in- 
rpretation of the item made by 
ate basic middle-class sample 
3 e the factor analysis of their re- 
porises placed the item on the “ten- 
e ndeg” side, which is charac- 
R P by acceptance of religious and 
ical items. An alternative inter- 
Pretation is also possible. The item 
eee be accepted by those who be- 
leve that both men and women 
should be allowed sexual freedom 
and it is “wrong” to restrict the sex- 
ual freedom of women. This, it is 
esested, might lie behind the fact 
a at the communists were most ac- 
€pting of this “tender-minded” item 
(see 12, Table I). This possible ex- 


planation is supported by the fact 
that the communists sampled were 
much more approving of companion- 
ate marriage (Item 29) than any 
other group. 

Rokeach and Hanley’s argument 
that Ferguson’s “‘religionism’’ and 
“humanitarianism” factors account 
for Eysenck’s data better than the 
R and T factors is convincing. This 
can be easily demonstrated for T- 
scale items by an examination of Fig. 
1. However, it is also difficult not to 
appreciate the clear-cut radical-con- 
servative axis that appears in Ey- 
senck’s data and to agree with 
Eysenck that there are semantic ad- 
vantages in using R and T when deal- 
ing with political parties. It is con- 
tended that what weakens Eysenck’s 
position is the fact that he has no 
items which are relatively pure meas- 
ures of T. It is further argued that 
this is a direct consequence of his 
original procedure. 

Eysenckoriginally collected, “From 
a total of some 500 items, all those 

_.. which had been shown to be of 
importance or relevance in any previ- 
ous research. When pruned of dupli- 
cations, it was found that the items 
did not suffice to make up the mini- 
mum number considered requisite, 
and others were added by random 
selection until 40 items altogether 
had been chosen” (8, pP- 121-122). 
It is extremely difficult to believe that 
the 40 items used exhaust the range 
of possibly relevant or important so- 
cial attitudes (see 8, Table XVIII, 
pp. 122-124). 

It is therefore pertinent to ques- 
tion the consequences of Eysenck’s 
original item selection procedures. 
If, instead of taking items which had 
been of relevance in previous research, 
he had analyzed the definition of 
tough-mindedness and then selected, 
invented, or modified items which 
appeared relevant, and then factor 
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analyzed responses to them and 
other items, he might well have iso- 
lated a much purer dimension of 
“tough-tender-mindedness.” Such 
a comment implies that there are 
such items or they might be found. 
Machiavelli makes many statements 
that are tough-minded, to say the 
least, but are not concerned with sex, 
religion, nor punitive reflections upon 
man (as are Eysenck’s tough-minded 
items). Whether a tough-mindedness 
scale could be constructed whose 
items are relatively independent of 
radicalism-conservatism or not, is an 
empirical question. 

Although this is as yet an unre- 
solved problem it has a great deal to 
do with what is a key hypothesis in 
Eysenck’s theorizing: “.. . there is 
in truth only one ideological factor 
present in the attitude field, namely 
that of Radicalism-Conservatism, 
The T-factor itself does not consti- 
tute an alternative ideological system 
but is rather the projection on to the 
social attitude field of a set of person- 
ality variables” (8, p. 170). Itis sug- 
gested that Eysenck was forced into 
the above position as a consequence 
of his original selection of items which 
did not cover aspects of tough- and 
tender-mindedness which were rela- 
tively independent of radical-con- 
servatism. Jf such items exist, then 
he might have found two ideological 
factors, the one radical-conservative 
and the other a means-ends dimen- 
sion. Does one take an amoral atti- 
tude in implementing Political ideol- 
ogy (be it radical or conservative) or 
is there a concern with ethics and 

principles? 

It is of interest to note what per- 

sonality variables Eysenck believed 
were relevant to political attitudes. 
He suggests, “.., ‘tough-minded- 
ness’ is a projection on to the field of 
social attitudes of the extraverted per- 
sonality type, while ‘tender-minded- 


ness’ is a projection of the introverted 
personality type” (8, p. 174). Thus 


Eysenck believes that communists , 


and fascists are extraverted whereas 
conservatives and socialists are 1n- 
troverted. Evidence comes from 
Coulter’s study in which TAT rat- 
ings on extraversion gave the Cony 
munist and fascist samples a higher 
score than members of the ag 
group (8, p. 180). The dangers g 
comparisons utilizing scores made by 
the latter group have already been 
indicated. . 

Eysenck also cites an unpublished 
study by George in which correla- 
tions between introversion-extrave™ 
sion and T were found. There was P 
marked relationship with R. Neitar 
R nor T was related to Eysenc u 
other personality factor of neuro 
cism (8, pp. 177-179). 

It is impossible to confirm or mri 
Eysenck’s hypotheses in any conc t 
sive fashion upon the basis of aval ‘i 
able data. As an alternative it is T 
gested that both radicalism and 2 
true tough-minded amoral saron 
might well be related to personan? 
factors but that the relationships de 
pend upon the social setting. r 
attempt to relate personality v 
bles to political ideology without ta is 
ing the social context into accoun ki 
apt to be highly misleading as we i ly 
an oversimplification of some hig gf 
complex interrelationships. Thus z 
a study of communist defector 
Almond (2) reports marked qifte 
ences between middle-class and O 
ing-class members in the patterns < 
motivation leading to their a 
into the party. The former, On ai 
basis of analyses of interviews, W of 
characterized by a high incidenc® | 5 
neuroticism, the latter were " ë 
These personality differences w 
also related to the type of role plari 
in the party since the screening 3 
training of party members le 
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quite marked role differentiation. 
Those who became communist elites 
were quite different personality-wise 
from those who did not. Almond 
Presents a convincing argument indi- 
cating that different sorts of indi- 
viduals are attracted to the Com- 
munist Party in different countries, 
at different historical periods (before 
and during the Popular Front pe- 
riod), that in some countries minority 
members are attracted and in others 
they are not, and a host of relevant 
Social and historical factors are oper- 
ative in causing people to join the 
Communist Party. 
R Any simple statements about the 
pe mimis personality” can fairly 
F said to reflect a lack of apprecia- 
tion for the complex social processes 
involved in ideological deviance. This 
a not to say that members of the 
eared Party are not unique 
z eE certain personality dimensions. 
eae 1s nat to say that communists 
Ba ascists have no personality char- 
pe cori in common which differ- 
oe e them from the “political 
n mal population. The point be- 
g emphasized is that there is a wide 
range of diversity among members of 
communist and fascist parties and 
any broad generalizations about the 
characteristics of communists and 
ascists which are based upon limited 
Samples are highly suspect. 
Despite profound disagreement with 
ysenck’s methodological capricious- 
ness and his restricted theoretical 
Position, there are some valuable in- 
sights which can» be derived from a 
Critical analysis of his data. It is ap- 
Parent that communists differ from 
Others in the importance of the radi- 
Cal-conservatism dimension in re- 
ponding to items. It is also clear 
that he has provided, albeit unwit- 
tingly, compelling evidence that the 
F scale actually measures fascistic 
ideology. 


What is especially interesting about 
Eysenck’s data is the fact that it 
clearly refutes any notion that com- 
munists are mirror images of fascists. 
The communists sampled are mark- 
edly different not only from adher- 
ents of the major political parties 
but from fascists as well. The gen- 
eralizability of what has been in- 
formally called the ‘‘Budenz-Bentley 
syndrome” (authoritarians of the 
right and left are similar so it is easy 
to switch from one extreme to the 
other) is not supported. It should be 
noted that Almond’s data also refute 
this hypothesis. He found that only 
10 per cent of his sample of com- 
munist defectors became religious 
converts or returnees, members of 
the extreme right, or conservatives. 
The majority (53 per cent) became 
moderate socialist or trade unionists, 
6 per cent remained on the extreme 
left, 18 per cent were politically indif- 
ferent, and the remaining 13 per cent 
were classified as “other” or “un- 
known” (2, Table 15, p. 357, and 2, 
p. 357). 

The present critique has focused 
upon Eysenck’s treatment of com- 
munists and fascists along the dimen- 
sions of tough-mindedness and au- 
thoritarianism. It would be grossly 
unfair as well as misleading to imply 


that Eysenck considers these ages 
as similar on all other dimensions. <€ 
lysis of TAT 


cites Coulter’s ana 
protocols in which it was found that 


the correlation between ratings of 
direct vs. indirect aggression was 
—.94 among the communists sam- 
pled and +.61 among the fascists 
sampled (8, p. 205). 

Since Coulter’s thesis has not been 
published, a more detailed methodo- 
logical analysis is inappropriate at 
the present time. Her research is of 
interest, however, not only because 
of the many striking relationships 
found (as the magnitude of the cor- 
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relations cited by Eysenck indicates) 
but also because she utilized a bat- 
tery of diversified instruments. Criti- 
cism of the use of a “neutral” group 
of a highly atypical nature as a basis 
for comparisons does not necessarily 
imply that Coulter’s actual findings 
are not valuable. 


SUMMARY 


Eysenck’s treatment of the per- 
sonality of communists has been sub- 
jected to detailed analysis in the pre- 
ceding pages. It is concluded that: 

1. The samples studied are not 
representative of the parent popula- 
tions, that there is differential bias 
in the sampling of various groups, and 
that generalizations drawn from these 
samples are therefore unwarranted. 

2. The ‘‘tough-mindednesg’” scale 
leads to misleading comparisons 
among members of various Political 
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parties because of biases built iato 
the scoring system. Further, the 
scale clearly does not measure tough- 
mindedness among the communists 
sampled since they responded to in- 
dividual items in terms of their radi- 
cal-conservative loading. 

3. The contention that commu- 
nists are “authoritarian” as mennee 
by the F scale is unjustified since E 
is based on the comparison of a En 
munist sample with a highly aberren 
“neutral” group. 9 

4, le which are utilized 
to differentiate communists and on 
cists from other samples are highly 
irregular and violate the data. Ph 

5. A re-examination of the da 
indicates that the communists yon 
fascists sampled differed from nae 
another in crucial aspects as we mes 
being different from the various CO 
Parison groups sampled. 
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THE PSYCHOLOGY OF POLITICS AND THE PERSONA 
LITY 
SIMILARITIES BETWEEN FASCISTS AND COMMUNISTS 


H. J. EYSENCK 
Institute of Psychiatry (Maudsley Hospital) University of London 


To have one’s writings submitted 
to a very detailed and exhausting 
critique in the pages of the Psycho- 
logical Bulletin is a great honor; 
to have this happen twice is some- 
what overwhelming. Before, there- 
fore, replying to Christie's comments 
(1) I would like to take this oppor- 
tunity of thanking both him and my 
earlier reviewers (6) for drawing at- 
tention to several minor misprints 
In The Psychology of Politics (4). 
While, as will be seen, I cannot agree 
with any of the major criticisms put 
forward, I shall always be indebted 
to them for their painstaking ex- 
amination of the details of my book.! 

It is curious how much alike 
Christie (1) and Hanley and Rokeach 
R are in their failure to deal with 

e logical development of the 
theories and experiments outlined in 
this book (4). Psychological theory 
and factorial studies agreed in show- 
ing that the interrelations of social 
attitudes in Great Britain required at 
least two orthogonal factors or di- 
mensions for their description; these 
factors were labeled R (for radical- 
ism-conservatism) and T (for tough- 
mindedness vs. tender-mindedness). 
Many theoretical and practical rea- 
Sons are given why, descriptively, 
these two factors are superior to any 
of the innumerable alternative rota- 
tions which could be made, and 
Christie appears to agree with this 
when he says that it is “difficult not 


* Some of the points Christie makes have 
iready been answered in my earlier reply to 
Rokeach and Hanley (5). The reader may 
ike to consult this earlier paper imconjunction 
With the present one. 


to appreciate the clear-cut radical- 
conservative axis that appears in 
Eysenck’s data, and to agree with 
Eysenck that there are semantic ad- 
vantages in using R and T when deal- 
ing with political parties.” 

Our theoretical position leads us 
to believe that the T factor is the 
projection onto the attitude field of 
the personality dimension of extra- 
version-introversion, in the sense 
that extraverts will have tough- 
minded attitudes, introverts tender- 
minded attitudes. The content of the 
attitudes of extraverts and introverts 
respectively will be determined -by 
their position on the radicalism-con- 
servatism axis. It would follow from 
this hypothesis that there should be 
very few, if any, pure T items; tender- 
mindedness and tough-mindedness 
should always appear in conjunction 
with either right-wing or left-wing 
tendencies. This is what we have 
found in actual fact after an examina- 
tion of many hundreds of different 
items. It is very satisfying to find 
hypotheses supported in this way, 
yet oddly enough Christie appears to 
hold the opposite view. He writes 
“Tt is contended that what weakens 
Eysenck’s position is the fact that 
he has no items which are relatively 
pure measures of T.” The fact that 
if many such items could be found 
the theory which has been elaborated 
in The Psychology of Politics would 
be, not just weakened, but completely 
disproved, does not seem to occur to 
Christie. He blames our procedure 
of item selection for our failure to 
find pure T items, and says that if 
the writer “had analyzed the defini- 
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tion of tough-mindedness and then 
selected, invented, or modified items 
which appeared relevant and then 
factor analyzed responses to them 
and other items he might well have 
isolated a much purer dimension of 
‘tough—tender-mindedness.’ Such 
a comment implies that there were 
such items or they might be found. 

Whether a tough-mindedness 
scale could be constructed whose 
items are relatively independent of 
radicalism-conservatism or not, is an 
empirical question.” 

Having attempted for Many years 
to do what Christie advocates, and 
having had several students make 
similar attempts, all without success, 
the writer believes that Christie is 
somewhat optimistic, Perhaps if he 
had himself some practical experience 
in carrying out work of this kind he 
might be less inclined to dismiss the 
concentrated effort: 


to make one’s failure to find such 
items convincing evidence to the un- 
prejudiced Judge. Christie’ s critique 


rto no pure al ty 
nd granted also 


| analysis of the 
attitude field requires two dimen- 


sions, it is clearly essential for the 
construction of a T scale to use items 
having reasonably high correlations 
with T, and which are selected in 
such a way that their correlations 
with the Rscale balance out. Christie’s 


comment on this is that “The crucial 
point in an interpretation of Eysenck’s 
results is that the T scale is a some- 
what better measure of R than T. 
The mean loading of T scale items on 
T is .38, on R .48.” The confusion 
evident in this quotation appears to 
invalidate most of Christie’s argu- 
ment as far as it relates to the con- 
struction of the T scale. The crucial 
point is that items are selected in 
such a way that if we have two tough- 
minded items one would be a radical, 
the other one a conservative item. By 
adding the two we add the T vari- 
ances and cancel out the R variances. 
As an example of this, let us consider 
an imaginary miniature scale con- 
sisting of two items. The first item 
relating, say, to trial marriages harn 
loading of +.6 on R and +.5 on e 
the other item relating, say, tO o 
death penalty has a loading of —- 
on Rand +.5 on T. For the purpose 
of the T scale a “Yes” answer woul 
in each case be counted one point. 
Person saying “Yes” to both ques- 
tions would therefore get a score on 
the T scale of 2, a person answering 
“No” to both questions would get i 
score of zero, The fact that bot 
items have higher correlations pis 
than with T does not mean tha 
the sum of the answers is a goo 
measure of R. A person high on d 
would say “Yes” to the first an 
“No” to the second item; a perso” 
low on R would reverse this. This 
Point would seem too elementary t° 
discuss in such detail, but as muc 
of Christie’s critique is based on it 
it seemed desirable to clear it UP: 
Okeach and Hanley appear to D° 
subject to a similar error of interpre 
tation. If Christie were, in fact, COF 
rect in his contention that the i 
scale is a good measure of R, then } 
should correlate with the R scale. 4% 
the studies reported in The Psycholog) 


f 
i 
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of Politics (4) show, no such correla- 
tions have in fact been observed. 

The writer would readily admit 
that our first version of the T scale 
fell short of perfection in several 
respects; this was one reason why 
an improved version was construct- 
ed by Melvin (9). However, Christie 
is in the unfortunate position that 
if we completely accepted his criti- 
cism of the scoring system adopt- 
ed, then our results would support 
even more strongly our own hy- 
pothesis, and go counter to his. He 
maintains that “by virtue of an 
asymmetric distribution of items 
combined with Eysenck’s singular 
Scoring system, a hypothetically con- 
sistent fascist is automatically made 
More ‘tough-minded’ by one point 
than a hypothetically consistent com- 
munist.” As we have throughout 
found communists to be slightly less 
tough-minded than fascists, Christie’s 
argument would suggest that, in fact, 
we should increase the communists’ 
Scores by one point, thus making 
them even more like the fascists than 
appears in our results. As Christie’s 
main argument appears to be that 
communists are not tough-minded at 
all, and are quite unlike fascists in 
this respect, acceptance of his criti- 
Cisms of our scoring system would, 
therefore, strengthen our position 
and weaken his. 

The same is true when we look at 
another comment. Christie main- 
tains that “the arbitrary system of 
Scoring which treated zero responses 
as ‘tough-minded’ thus introduced a 
bias of unknown extent in the direc- 
tion of making the members of the 
three major parties more ‘tough- 
minded,’ relatively speaking, than 
those of the two deviant parties.” 
Again, even if Christie's criticism 
were well taken, it would merely 
mean that we had loaded the dice 


against our own hypothesis; making 
the appropriate corrections would 
make our results support our theory 
even more strongly. 

Another criticism of the scoring 
system the writer does not under- 
stand at all. Christie maintains that 
“the T scale simply does not apply 
to communists (or at least to this 
sample). Comparisons of scores made 
by communists on a scale on which 
they do not respond along the con- 
tinuum measured with scores by 
other samples are meaningless.” Just 
what is meant by saying that a cer- 
tain scale “simply does not apply” to 
acertain group? One might imagine 
that it would have zero, or at least 
quite low reliability for that group; 
yet Coulter has shown that the relia- 
bility of the T scale is higher for the 
communists than for fascists, or our 
neutral group (2, p. 43). Does it, 
perhaps, mean that our measurement 
of T is only a watered-down and less 
reliable measure of R? The relia- 
bility of the T scale for communists 
is higher than that of the R scale, 
and the two scales do not correlate. 
Does it, perhaps, mean that T does 
not correlate with other variables in 
the case of communists, while it does 
so in the case of fascists and other 
groups? Again, Coulter (2) has 
shown that the opposite is true, if 
anything. Is it that scores on the 
scale do not behave in conformity 
with firmly grounded theory? But 
here again, as shown in The Psychol- 
ogy of Politics (4) and the more re- 
cently concluded study by Nignie- 
witzky (10), to be discussed below, it 
is found that communists behave pre- 
cisely in the predicted manner. 

Is it that the T scale is irrelevant to 
political party structure as compared 
with the R scale? Here, Nigniewit- 
zky’s finding on a representative 
sample of the French middle-class 
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population is relevant; he finds that 
the T scale, while independent of the 
R scale statistically, is actually su- 
perior to the R scale in differentiating 
between members of the different 
political parties (including the com- 
munists) (10). It is submitted, 
therefore, that Christie’s statement is 
strictly meaningless. If Christie had 
quoted the relevant statistical find- 
ings, this fact would have become 
apparent immediately. 

We must now turn to the problem 
of sampling. Christie spends a con- 
siderable amount of space in trying 
to show that our middle-class sample 
was ‘‘completely unrepresentative of 
the British middle class.” As the 
writer himself has stressed this point 
several times, Christie’s work ap- 
pears to be a task of supererogation. 
As was pointed out in The Psychology 
of Politics (4, p. 127): “Our interest 
lay not in obtaining a representative 
cross-section of the population but in 
comparing different political groups 
This can best be done by having the 
groups of equal size, thus reducing 
sampling errors to a minimum, If 
mean values are wanted for the total 
Population, then mean values for the 
selected groups can be multiplied b 
the proportions these groups form if 
the total population, thus giving an 
adequate indication of Population 
values.” Again, Christie appears to 
doubt this statement:—"Ip view of 
the fact that Eysenck’s basic middle- 
class sample is markedly unrepre- 
sentative of the British middle- 
classes, it would be highly dangerous 
to project their attitudes to obtain 
an estimate of the parent popula- 
tions.” 

It may be tedious to the reader to 
spell out this point in detail because 
of its quite elementary nature, but 
as Christie has devoted so much 
space to it, his misinterpretation 
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requires correction. If we are in- 
terested in the variance contributed 
to a given score by a number of fac-, 
tors, such as political party, sex, age, 
and education, then the most efficient 
design for giving us such informa- 
tion is obviously one in which all the 
possible groups into which these four 
methods of classification divide the 
population are represented in equal 
number. A representative sample of 
the population would be relatively 
inefficient, particularly when some of 
the groups (liberals, university-edu- 
cated) comprise only a very sma 
portion of the population. Mean 
values from such an analytic sample 
cannot, of course, be taken as repre- 
sentative of the whole population; 
we would require to correct the fig- 
ures obtained for each subgroup by 
taking into account the proportion 
of people in that special group in the . 
total population. When this is done — s 
we obtain an estimate of population 
parameters which is only a little 10- 
ferior to one obtained from a random 
sample. Thus, an analytic sample is 
vastly superior to a random sample 
with respect to the analysis of the 
influence of different factors, and ae 
very little, if at all, inferior to it wit! | 
respect to obtaining estimates o 
population parameters. As our pur- | 
pose was not that of obtaining popu- 
lation parameters but of determining 
the relative influence of the factors 
indicated, Christie's argument ote 
pears to be quite irrelevant to t3e 
facts of the situation. 7 

It should not be assumed from this; 
however, that our sampling proci 
dures are not subject to criticism” 
on any point. We know of no co™ 
plex study in social psychology W ae 
has handled this problem with oo 
plete adequacy, and we have throug, é 
out been aware of certain weakness 
in our sampling procedures. The 
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tails have always been given in suff- 
cient detail to enable the reader to 
form his own views as to the degree 
to which our conclusions should be 
modified because of these imperfec- 
tions. In this our writings are in de- 
cided contrast to Christie's own cri- 
tique. He seems to be quite happy to 
establish a point by referring to work 
carried out by Rokeach in which 
scores are given for groups of stu- 
dents and Vauxhall Motors workers 
without any mention at all of sex 
composition, method of sampling 
used, and so forth. Critics who cavil 
“a the relatively full data presented 
=e respect to the sampling pro- 
= ures used by the writer might be 
oe to heed their own advice. 
x iy a of Christie’s failure to give 
id etails at all, the writer cannot 

ke seriously the means presented, 


_ Or the criticisms based on them. 


it w oS fortunate that quite recently 
pe ia ecome possible to carry out a 
a aed study in France, making 
tive a properly selected representa- 
SA sample of the French middle- 
See peoulsten, This study was 
i out by R. Nigniewitzky (10) 
it ai results which are of consid- 
a relevance | to Christie's re- 
im ks. Communists on the new and 
ren form of the T scale were 
Tiec to have a mean score of 10.3; 
s ists to have a mean score of 10.2; 
a mane fellow-travelers had a 
a vias score of 10.2. The mean score 
e supporters of all the other main 
aoe parties was 17.6. Commu- 
ists and fascists again appear as very 
rs sa more tough-minded than the 
€mocratic parties. 
ont results are important for 
P a reasons. Christie takes us 
fa ask for selecting communists and 
ete who were actively engaged 
È political work, and comparing 
em with people who voted for the 
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main three parties, but were not 
specially active in the political world. ` 
This, he maintains, introduced a 
sampling bias because differences 
may be due to the factor of being 
politically active rather than to being 
procommunist or profascist. This 
argument is almost impossible to 
disprove because in England mem- 
bers of the communist party and 
communist adherents generally 
are all characterized by this strong 
degree of political activation; it 
would be practically impossible to 
find communists and fascists not 
active in this way, and if any 
could be found they would be ex- 
tremely atypical. Conversely, the 
typical conservative, liberal, or so- 
cialist voter or party member, how- 
ever strong his convictions, does not 
indulge in the same kinds of activities 
as does the communist or fascist. It 
would, therefore, be not just difficult 
but impossible to find conservatives, 
liberals, or socialists carrying out, 
with equal intensity, the kinds of 
things done by communists and fas- 
cists, and again, if such people could 
be found they would be extremely 
atypical. Christie argues “In short, 
to what extent are differences in at- 
titudes between communists and 
major party members traceable to 
ideology per se and to what extent 
to other factors relating to political ( 
activity?” It would, indeed, be in- 
teresting to know the answer to this 
question, but only someone excep- 
tionally ignorant of conditions 1n 
Britain at the moment would expect 
it to be possible to find the answer 
in this country. P 
There are other difficulties which 
make any ordinary kind of sampling 
procedure inapplicable in England. 
The number of fascists and com- 
munist party members in the whole 
country is usually considered to be 
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less than 100,000; thus, it would take 
a sample of 300-400 people to find a 
single communist or fascist. To get 
even the relatively small number of 
86 communists and fascists which 
formed our sample, it would require 
a random sample of some 25,000 peo- 
ple. When to this is added the secre- 
tiveness of fascist party members, 
who usually refuse to answer ques- 
tions, and the contempt of commu- 
nists for this type of work, and their 
consequent aversion to taking part in 
it, the impossibility of using orthodox 
methods should be even more obvi- 
ous. As if all this were not enough, 
there is in addition the difficulty that 
if one were not to make party mem- 
bership the criterion for acceptance 
of a person as being a communist or 
fascist, one would be left with no cri- 
terion at all. In the case of the major 
parties, identification was based on 
voting behavior. This is not applica- 
ble to the fascists as there were no 
fascist candidates during the elec- 
tion, and it is hardly applicable to 
the communists because communist 
candidates were Standing only in a 
very small number of highly atypi- 
cal constituencies, Christie condemns 
our method of sampling; he does not 
indicate how it could have been im- 
proved—even without taking into 
account the limitations imposed by a 
budget which never rose above, and 
frequently fell short of, the sum of 
100 dollars per annum. 
It is here that our French 
so important. In France, t 
munist party is a mass 
sufficient members of 
character to make it comparable to 
other parties, and to make Possible 
orthodox methods of sampling. When 
this is done, as has been Pointed out 
above, the result shows even more 
striking differences in the Predicted 
direction than were found in this 


study is 
he com- 
Party, with 
a nonactive 
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country. Thus, an improvement in 
sampling procedures, as demanded by 
Christie, and an improvement in the 
scale used do not result, as would be 
predicted from his criticisms, in a 
lessening of the observed differences 
between communists and the ortho- 
dox political parties; quite on the 
contrary, the differences become much 
wider and much more significant. 
Christie might well reply that his 
criticisms were concerned with the 
studies reported in The Psychology of 
Politics, and that this new study is 
irrelevant. This, however, is not so. 
In all experiments which involve 
sampling, the investigator has to 
make certain decisions as to which 
factors are, and which are not, likely 
to influence the results, and in nee 
of experimental control. Similarly, 
the reader has to decide to what ex- 
tent he is willing to accept the inves- 
tigators’ judgment and to what ex- 
tent he is prepared to reject it. Even 
the best stratified sampling pro- 
cedure involves a decision as to the 
relevant variables which are to be 
used for the stratification. There are 
grounds here for legitimate disagree- 
ments. No random sampling pro- 
cedure fails to encounter the problem 
of nonresponders; no method of han- 
dling this is beyond criticism. In 
Studies like the ones reported in The 
Psychology of Politics, where random 
and stratified sampling could not be 
used in the orthodox manner, deci- 
sions have to be made by the invest- 
gator with which the reader may dis- 
agree legitimately. Only addition! 
investigations can settle issues WhiC 
otherwise must remain a matter © 
Opinion. In the writer’s view, t 
sampling methods used in The Psy- 
chology of Politics, while far from per- 
fect, have adequately substantiate 
the hypothesis under investigation’ 
According to Christie they have not 
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The only way of deciding is not by 
rather pointless argument, but by 
further experiment.? It is the writer’s 
view that the Nigniewitzky (10) ex- 
periment has settled the issue as far 
as the sampling controversy is con- 
cerned. 
_ A good deal of Christie's argument 
1Sconcerned with findings from Ameri- 
Can studies, which he believes con- 
tradict our own findings. He appears 
to believe that relationships between 
Social attitudes and personality fac- 
tors depend upon the social setting. 
Any attempt to relate personality 


prt of the criticisms made by Christie may 
Whee as an example of the kind of point on 
ae legitimate disagreements might arise. 
trols writer, having found that certain con- 
is dene as age, were uncorrelated with vi in 
‘eee e-class sample, did not consider it 
work ary to impose these controls on his 
A ang class sample as this would have made 
aa Pa an very much more expensive 
waa ersome. Christie argues that while 
anok r irrelevant in the middle-class 
féleoacet ere is no proof that they were ir- 
cones in he working-class sample, and that 
hind Tee controls should have been 
Sertain| - This is a possible point of view. It 
possib y would be more satisfactory if all 
trolled | sources of variation could be con- 
im Jn experiments of this kind. As this is 
Possible, judgments have to be made as to 
fe cone importance of different aspects of 
ence eto In the absence of any evi- 
ue to the contrary, it seemed unlikely to 
ae oe that correlations between T on the 
sove and and age, etc., on the other would be 
Com, ry dissimilar ina working-class group as 
B a with a middle-class group. Christie 
shi EA some evidence to show that relation- 
Re etween attitudinal variables are differ- 
1n middle- and working-class samples, but 
ok of course, is quite a different point; we 
fa ere concerned with correlations between 
ea ‘Scores and control variables. It may be 
ade Parentheses, that in recent unpub- 
tone work we have found relationships be- 
oh T and the various control variables to 
mi dae, much the same in working-class as in 
Sie samples. This does not, of course, 
es idate the principle of Christie's criticism; 
à noe illustrates that a criticism may be 
Sa koy legitimate without being neces- 
nly damaging to the conclusion arrived at. 


variables to political ideology with- 
out taking the social context into ac- 
count is apt to be highly misleading 
as well as an oversimplification of 
some highly complex interrelation- 
ships.” The reader might not guess 
it from Christie’s comments, but this 
is almost precisely what the writer 
himself has pointed out in his book 
This is what he has to say. After 
pointing out that most of the work 
contained in The Psychology of Poli- 
tics was carried out in England, he 
goes on to say that “results from 
Germany and Sweden, as well as 
from the U.S.A., make it seem likely 
that the main conclusions drawn here 
would apply equally well there; it 
would not be wise, however, to gen- 
eralize too far.... This is particu- 
larly important when considering the 
personality structure of members of 
groups such as the fascist and com- 
munist parties. In our culture, these 
are minority groups; it is unlikely 
that conclusions based on members 
of such groups could be transferred 
without change to members of the 
Communist Party in the U.S.S.R., or 
to members of the former N.S.D.A.P. 
in Germany. When we talk about 
communists and fascists, therefore, it 
is about British communists and 
fascists we are talking, not about their 
foreign prototypes. At times the 
reader will undoubtedly be tempted 
to generalize beyond this restriction; 
if he does, he does so at his own peril” 
(italics not in original). Many of 
Christie’s arguments and criticisms 
are based on assumed similarities be- 
tween English and American condi- 
tions. He is free to indulge in these 
speculative exercises, but the writer 
should make it clear that they have 
little relevance to his own writings or 
views. Attempts have been made to 
extend our work to other countries 
like Spain (11), France (10), Sweden 
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(7), Germany (3), the Near East (8), 
and so forth; the accumulation of 
facts would appear more important 
than the armchair theorizing in which 
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Christie delights. The reader of these 
detailed reports may form his own 
views regarding the degree of cul- 
tural dependence of R and T. 
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Eysenck’s reply (11) to a method- 
ological critique (4) of his writings on 
Personality and politics appears, at 
best, to be lacking in candidness. A 
number of specific criticisms were 
made of his work. He does not refer 
to many of these. Others he at- 
tempts to evade by distorting the 
Original criticism and giving irrele- 
vant answers, This is a serious ac- 
Cusation. It may be best evaluated 
aril the original criti- 
à s and then considering his re- 
Ponses, if any, to them. 
ey initial comment should be 
ae Eysenck says that the criti- 
i m i to “...deal with the 
Pe development of the theories 
P experiments outlined [in The 

Sychology of Politics (9)]...” (11, 
S 431). The reason for not taking 
ae theories seriously is sim- 
ae heir basis is essentially an in- 
ye es one.? They primarily rest 

i o ata collected by Eysenck and 
fina, udents. Other material which 
ve support is cited; that which is 
(3) oy. is slighted or ignored 
in » Errors in the collection, process- 

&, or analysis of data on the part of 
ota and his collaborators are 
| om extremely relevant for the 

idity of his theories. 
sop though the temptation to rise to 
x e of Eysenck’s more irrelevant 
Siping is tantalizing, scientific criti- 
m is best served by returning the 
Tgument to the level of fact. The 


1 3 ; 
Sant title of this paper is based, appropri- 
Eyes enough, upon that of a book by H. J. 
a mack (8). 

Th This is not intended as a critical remark. 
© writer is favorably disposed toward a 
inductive approach at the present state 
e development of social psychology. 


trul: 
of th 


procedure followed in the critique 
of systematically evaluating method- 
ological flaws in Eysenck’s work will 
be followed for the sake of simplicity 
and comprehensiveness. 


Tue T SCALE AND ‘'TOUGH- 
MINDEDNESS” 


Sampling. Ananalysisof Eysenck’s 
samples of middle-class supporters of 
various political parties was made. 
It was concluded that they were non- 
representative, as evidenced by gross 
discrepancies in age and education 
between them and estimates of the 
British middle class based upon Brit- 
ish census data. Eysenck agrees with 
this conclusion but regards a syste- 
matic attempt at indicating the ex- 
tent of the bias as a “task of super- 
erogation” (11, p. 434). 

Eysenck was not criticized for us- 
ing available samples or for advocat- 
ing properly conducted analytic sam- 
pling. He was criticized for main- 
taining that scores of, e.g., univer- 
sity-educated, older, middle-class, 
male Liberals, as sampled by him, 
could be projected to the parent pop- 
ulation of individuals meeting these 
criteria in Great Britain. 

It was pointed out that Eysenck’s 
students tended to collect question- 
naires from individuals who were pre- 
sumably most like them, i.e., young 
and highly educated. At best, one of 
twenty among the British middle- 
class population have had a smatter- 
ing of university education, well over 
half of Eysenck’s sample have so ben- 
efitted (4, p- 415). Half of his sample 
were under 30 years of age as con- 
trasted with but a fifth of the British 
adult population (4, P- 414). 

439 


440 


Furthermore, Eysenck’s students 
gave questionnaires to friends or 
acquaintances. At best, the parent 
population of his samples can be de- 
fined as consisting of only those indi- 
viduals who were known by students 
of Eysenck’s. Strictly speaking, sta- 
tistical generalizations cannot be 
made to even this highly restricted 
parent population since there is no 
evidence that there was random selec- 
tion of respondents within this pool 
of potential subjects. Lindquist (14, 
pp. 73-74) has a clear discussion of 
the pitfalls involved in generaliza- 
tions based upon nonrandomly drawn 
samples. 

There is therefore no justification 
for Eysenck’s Suggestions (5, p. 57, 
9, p. 127) that the test scores of his 
admittedly nonrepresentative sam- 
ples can be projected to obtain a 
meaningful estimate of scores of the 
British population, 

No evidence whatsoever was given 
as to how Eysenck select 


ese were selected 


sis of being known 
by Eysenck’s students at London 


University. If anyone maintains that 
the scores of these 27 individuals can 
be meaningfully Projected to al 
working-class members of the Libera] 
Party in Great Britain he may be 
even more “exceptionally ignorant of 
conditions in Britain” than this critic, 

It was also pointed out that Ey- 
senck’s communist sample was se- 
lected from an active political or- 
ganization. They, by definition, were 
active group members and in this 
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sense differed from the majority of 
the population. The question was 
then asked as to ‘‘. . . whether they 
[Communists] are less different from 
those who are politically active in 
major parties than from those who 
merely list themselves in a particular 
way when asked to do so” (4, p. 
413). The question raised was: are 
the T-scale scores of communists 
(by definition, politically active) 
more similar to those of active mem- 
bers of major political parties than 
to those of inactive members of major 
Political parties? Eysenck does not 
address himself to this question. In- 
stead he repeats the point made in 
criticism that there are no inactive 
communists and then concludes that 
it follows that the comparison is 1M- 
possible !3 


Finally, Eysenck’s description of 
Coulter's sample of working-class 
males in the British Army as pice 
“random sample of the British work- 
ing-class” was questioned. The 
absurdity of this statement was 
Pointed out. The test scores made 
by these men were compared be 
Eysenck’s other sample of prun 
working-class males and significan 
differences between the mean seon 
of the two groups on the R (radical 
ism) scale were found. Neither wen 
ple was representative. Ditteren 
in test scores indicated they were es 
drawn from the same parent popula 


* Melvin compared the “tender-minded 
ness” scores of members of his sample Yio 
listed themselves as “active in politics ts’ 
definition of activity other than responder” 
self-classification) with those sample mer i 
who did not so list themselves (15, Fig- ed. 
following p, 329). No differences eme ri 
Such a finding is directly relevant to the pur- 
nal criticism. If Eysenck had cited it, ue o 
den of rejoining (that such a criterion critic. 
Vague) would have fallen upon the © ver, 
Instead of citing relevant evidence, hov ani 
Eysenck perverted the logic of the argu™ 
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tion. Eysenck does not produce any 
new evidence to rebut the analysis; 
indeed, he does not mention this 
aspect of the criticism. 

Eysenck has not chosen to give a 
rebuttal to any of the specific criti- 
cisms made of either his generaliza- 
tions which were based upon unrep- 
resentative samples or of his compari- 
sons of samples which differed in the 
Way they were drawn from the pre- 
sumed parent population. He at- 
tempts to evade the issue by imply- 
Ing that criticism was directed solely 
toward his sampling procedures rather 
than the generalizations based upon 
them. He says, ‘‘We know of no 
complex study in social psychology 
which has handled this problem 
[sampling] with complete > ade- 
hte wee” (11, p. 434). Although 
~ne writer disagrees, the point is 
relevant. What is relevant is the 
Ps of caution with which gen- 
si i are made from samples 
tg nee be randomly drawn 
be p: e parent population. Al- 

ond s study of communist defectors 
en JS an example of scientific re- 

Taint in such a situation. 
ack easurement. Attention was di- 

ed toward the bias caused by 
a treatment of the ‘‘no- 
ies a category. It was pointed 
bs at its always being scored as 
tough-minded” led to the seemingly 
oneal situation where a person 
© had no opinion on anything 
merged as the epitome of “tough- 
mindedness” whereas someone who 
‘agreed with everything was a para- 
80n of “‘tender-mindedness.” Eysenck 
es not choose to give any rationale 
ed the unique procedures which 
Ould lead to such nonsensical results. 


4 
list Among the studies cited in the references 
ne in the critique, that of Stouffer (18) 
ae the problem of sampling in exemplary 
n. 
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It was also pointed out that the 
specific comparisons between mem- 
bers of the three major political par- 
ties and the two deviant groups 
(communists and fascists) were af- 
fected by this scoring procedure. It 
was inferred from bits of data pre- 
sented in Eysenck’s writings that 
members of the major parties were 
characterized by a higher frequency 
of ‘‘no-answer’’ responses than mem- 
bers of the two deviant parties. As 
Eysenck correctly notes in his rebut- 
tal, this loaded the dice against his 
“hypothesis.” The amount of error 
introduced by this peculiarity of the 
scoring system is so much less than 
that arising from some of the mis- 
takes in addition (16) and highly 
aberrant methods of analysis (4) that 
it does not materially strengthen 
Eysenck's position. This is the sole 
comment Eysenck makes about his 
strange treatment of the ‘‘no-answer”’ 
category, and does not clarify the 
basic issues involved. 

In examining the asymmetric dis- 
tribution of items in the four quad- 
rants of Eysenck’s two-factor space 
it was pointed out the peculiarities 
of the scoring system were such that 
a hypothetically consistent commu- 
nist would automatically be less 
“tough-minded” by one point than 
a hypothetically consistent fascist. 
Eysenck’s reply is a model of dis- 
ingenuousness: “ As we have through- 
out found communists to be slightly 
less tough-minded than fascists, 
Christie’s argument would suggest 
that, in fact, we should increase the 
communists’ scores by one point, 
thus making them even more like the 
fascists than appears in our results” 
(11, p. 433). How the criticism could 
possibly suggest to Eysenck that in- 
creasing communists’ scores by one 
point could equate for the inade- 


quacies of the scoring system is most 
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puzzling. The T scale is so scored 
that the higher the score, the greater 
the purported ‘“‘tender-mindedness.” 
Adding a point to the scores of com- 
munists would therefore have the 
opposite effect from that postulated 
by Eysenck—it would increase the dif- 
ference between communists’ and fas- 
cists’ scores on the T scale! Aside 
from this non sequitur, Eysenck’s 
finding that communists are more 
“tender-minded” (by from roughly 
one to two points depending on whose 
addition is used) is partially due to 
the biases resulting from his scoring 
system. Actually, since his com- 
munist sample did not respond to 
T-scale items as Eysenck hypothe- 
sized they should, the bias leads to 


an unknown error in the scoring sys- 
tem. 


The criticism relating to the asym- 
metric distribution of 14 items in 
four quadrants is not rebutted by 
Eysenck’s discussion of a symmetrical 
two-item scale.’ This has nothing to 


He says, pone first item 
marriage has a loading of +.6 on 

on T; the other relating, say, a ate 
penalty has a loading of —.6 on Rand +.5 on 
T” (11, p. 432). The items referred to are 
Nos. 29 and 36 of his version of the T scale 
(9, Table XVIII, p. 123). The actual loadings 
of these items according to Eysenck’s Te- 
ported findingsare —.53 and +.56, —.60 and 
—.20 respectively (9, Table XX, p. 129). Both 
items are loaded on the radical side of R; the 
first is “‘tough-minded” and the second 
“tender-minded.” Eysenck goes on to say, “ 
person saying ‘Yes’ to both questions would 
therefore get a score on the T scale of 2 a 
person answering ‘No’ to both questions 
would get a score of zero” (11, p. 432). Ac- 
cording to Eysenck’s scoring system, however, 
they would both get scores of one on the T scale 
but for different reasons. A person accepting 
both would be given a point for his affirmative 
response to Item No. 36; a person rejecting 
both would be given a point for not accepting 


relating, say, to trial 
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do with the problem of asymmetry 
which exists in both Eysenck’s and 
Melvin’s scales. It may be that this 
is one of the points that Eysenck 
claims to have answered in his reply 
to Rokeach and Hanley (11, Foot- 
note 1, p. 431). There Eysenck says, 
“The T score combines in equal pro- 
Portions radical and conservative 
items and thus gets rid of the compli- 
cation introduced by the R fac- 
tor...” (10, p. 180). 

This statement is irrelevant. An 
examination of Fig. 1 in the critique 
(4, p. 419) indicates that seven T-scale 
items are saturated with radicalism 
and seven are loaded on the conserva- 
tive end of the axis. This does not 
get rid of complications because of 
the asymmetry of the dispersal of 
the items in the quadrants which was 
the point made in criticism. ; 

Eysenck nowhere in his writings 
discusses the rationale for having an 
asymmetric distribution of the T- 
scale items (five, four, three, and 
two) in the four quadrants of his fac- 
tor space. Let us accept for a mo- 
ment his ground rules for the distribu- 
tion of items—namely that there 


o 


Item No. 29. 


Further, “A person high on R would y 
“Yes' to the first and ‘No’ to the second item; 
a person low on R would reverse this” (11, P: 
432), indicates Eysenck’s confusion about 
items in his own scale and their scoring: 
Person high on R should accept both state- 
ments and one low on R should reject them 
if scored according to his procedure. 

It is suggested that Eysenck is forced to 
resort to the use of “imaginary” values Þe- 
Cause there are not, in fact, items with em- 
pirical loadings on T and R which are 50 
balanced that the variances will be cancele 
out as in his example. The actual loadings °” 
T and R are extremely crucial to the problem 
of asymmetry of item distribution. Eysenck t 

imaginary” loadings are not in agreemen 
with his findings but are given arbitrary values 
which support the argument which he ad- 
vances in attempting to evade criticism. 
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should be a balance between radical 
and conservative items in the T 
scale. Eysenck objects to ‘‘specula- 
tive exercises” based upon his data 
and for very good reasons as shall be 
demonstrated. The exercises to be 
Presented are designed simply to 
show the absurdities arising from 
taking Eysenck’s approach seriously. 
First, let us assume that a com- 
munist came across Eysenck’s ma- 
eal and accepted his criteria for 
alancing items in the T scale. With 
relatively little study this person 
could deduce that the responses to 
the items in the various quadrants of 
Eysenck’s factor space show fairly 
stable patterns of the degree of ac- 
ceptance by members of Eysenck’s 
Samples of adherents of various politi- 
cal parties. If, for some reason, this 
Pi ea wished to demonstrate 
a ais middle-class comrades (as 
a pled by Eysenck) were really 
ihe tough-minded"’—indeed, that 
i y were more “tender-minded” 
han Eysenck’s sample of conserva- 
tives—he could do so very simply by 
deleting one of the four items in the 
radical “tough-minded” quadrant 
and adding one in the radical “tender- 
minded” quadrant. Eysenck’s scor- 
ing system does not make commu- 
nists “tender-minded” when they ac- 
cept radical “tough-minded” items 
(as is their wont). It does make them 
tender-minded” when they accept 
the “tender-minded” radical items 
(as is their wont). Such a substitu- 
tion of items, which is completely in 
accord with Eysenck’s specifications, 
serves to make the communists more 
“tender-minded.” It also makes the 
conservative sample somewhat less 
‘“‘tender-minded.” Indeed, as shown 
in Table 1, the conservatives are now 
more “tough-minded” than the com- 
munists! 
Let us speculate even further. As- 


sume a conservative wishes to 
“prove” that communists are even 
more ‘‘tough-minded”’ (especially 
when compared to conservatives) 
than Eysenck’s figures indicate and 
that he has caught on to the fine art 
of juggling items. Staying within the 
ground rules, he then deletes three 
items from the conservative ‘‘tough- 
minded” quadrant (to which some 
of Eysenck’s conservative sample 
were receptive) and adds three items 
to the conservative ‘‘tender-minded” 
quadrant (which conservatives, ap- 
propriately enough, accept). The re- 
sults of such a manipulation make 
the conservatives even less ‘“‘tough- 
minded” than the communists than 
is done by Eysenck’s own procedure 
as indicated in Table 1. 

The preceding comparisons could 
have been made even more grotesque 
if the deletion and substitution of 
items had been based upon the ac- 
tual responses to individual items as 
reported by Eysenck (7, Table III, 
p. 200) rather than by manipulating 
the means of items in the quadrants. 

The point made by these “specula- 
tive exercises” is simple. The re- 
quirements for the T scale which 
Eysenck has stipulated are meaning- 
less. His original scale was so con- 
structed and scored that the means 
reported for those affiliated with vari- 
ous political parties (even when his 
arithmetic is corrected insofar as 
possible) represents the peculiarities 
of the biases built into it rather than 
the positions of respondents along 
any meaningful measure of “tough- | 
mindedness.” 

6 nany ways it would have been more 
teste N the T-scale items canal a 
ributed among the four quadrants. If this 
had been done, certain interesting conse- 
quences for Eysenck's speculations would 
flow. Computations along the line indi- 
cated in Table 1 indicate that in agreement 
with Eysenck’s and Rokeach and Hanley’s 
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It would have been even more ap- 
propriate to the criticism to examine 
the effects of Eysenck’s artifacts on 
the scores of fascists as compared 
with communists and major party 
members. Data on fascists are not 
available. However, it is clear that 
Present demonstrations of the ease 
with which Eysenck’s scoring system 
can be manipulated to produce con- 
tradictory results by a simple shift- 
ing of scale items is a sufficient rea- 
Son for viewing his conclusions as 
being Partially the result of illogical 
and unjustified procedures in scoring 
items, 

An analysis of the responses of 
Communists to specific T-scale items 
in the four quadrants indicated that 
they accepted or rejected items along 
a radicalism and conservatism di- 
mension. If an item had a saturation 
on the radical dimension communists 
accepted it whether it was ‘‘tough-” 
or “tender-minded.’ ’ The analysis 
indicated that no prediction could be 
made as to whether communists 


computations, middle-class Communists sam- 
pled by Eysenck are more “tough-minded” 
than middle-class members of major political 
Parties. Contrary to findings using Eysenck's 
distribution of items in quadrants, however, 
Working-class Socialists would be more 
aaa than working-class Commu- 

Sts, 

Of even greater interest, however, is the 
great stress which Eysenck’s theorizing (9, pp. 
259-260) places upon his finding that in every 
Political party working-class samples are 
More “tough-minded” than are middle-class 
Samples, If the items had been symmetrically 
distributed it would have been found that 
working-class Communists and Conservatives 
Were less “tough-minded” than middle-class 
Ollowers of these parties. The relative posi- 
tion on the T scale of working- and middle- 
class samples of Liberal and Socialist parties 
Would remain the same. This is but one exam- 
Ple of the effect which artifacts in measure- 
ment have upon Eysenck’s theory building. 


would accept T-scale items if only 
their saturation on T was known but 
that in every case in which knowledge 
of their saturation on R’ was known 
a perfect prediction could be made as 
to whether communists as a group 
would accept or reject the item. This 
is the basis for making the statement 
that the T scale “‘does not apply” to 
a certain group, in this case, members 
of the Communist Party. It was, 
perhaps erroneously, believed that 
the statement which Eysenck ‘does 
not understand at all” (11, p. 433) 
was clear in its original context. Evi- 
dently it was not and it shall be 
spelled out. Eysenck’s own data un- 
equivocally indicate that communists 
respond to T-scale items according to 
their saturation on R and not to their 
saturation on T. It was therefore ar- 
gued that the T scale, as scored by Ey- 
senck, does not measure “tough-mind- 
edness” among the members of the Come 
munist Party whom he sampled. This 
is the basic point and Eysenck’s dis- 
cussion of the statistical reliability of 
the .T,,scale in Coulter's sample of 
communists (11, p. 433) evades the 
issue. ' The issue is validity and not 
reliability. Is it too much to assume 
that Eysenck knows that it is possi- 
ble to have a reliable scale which has 
no validity? 

Analysis. In his rejoinder, Eysenck 
does not touch upon the critical com- 
ments made about his use of “average 
of average” scores rather than using 
conventional statistical techniques. 
He does not justify his arbitrary 
lumping together of fascists and com- 
munists when this violates the data. 
It is therefore not clear why he 
chooses as part of the title of his re- 
joinder, “... the Personality Simi- 
larities Between Fascists and Com- 
munists.” His data do not suggest 
similarities but rather dissimilarities, 
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Tue F SCALE AND ‘“‘AUTHORI- 
TARIANISM” 


Eysenck’s statement that the F scale 
measured authoritarianism rather 
than potential fascism was examined. 
It was concluded that such a state- 
ment- was completely unjustifiable 
in terms of other research and that 
Eysenck’s own data indicated the 
opposite. The only possible basis for 
such an assertion was the work of 
Coulter in which an alleged politi- 
cally “neutral” group made an un- 
precedentedly low score on the F 
scale. Evidence was presented which 
clearly indicated that this particular 
sample was a highly aberrant one 
when their other test scores, as re- 
ported by Eysenck, were evaluated. 

Eysenck’s only reply to this por- 
tion of the critique is his refusal 
to acknowledge the relevance of data 
collected in Great Britain by Rokeach, 
He is correct in objecting to a refer- 
ence to material which had-(at the 
time the critique was written) been 
submitted but not accepted for pub- 
lication, Since Rokeach’s monograph 
1S now In press, the reader will be able 
to evaluate Rokeach’s finding that 
the British college students sampled 
by Rokeach have lower F- 
than do his sample of Brit 
(17). 


The reasons for rejecting Eysenck’s 


scale scores 
ish workers 


TABLE 2 


F-ScALE SCORES or STUDENTS IDENTIFYING 
WITH VARIOUS POLITICAL PARTIES* 


Party N Item-mean 
Identification Score 
Conservative 54 3.98 
Liberal 22 3.39 
Labor (Atleeites) 27 3.51 
Labor (Bevanites) 19 3.12 
Communist 13 2.86 


* Item means computed from (17, Table 13, p. 34), 


argument that communists score high 
on the F scale were indicated in the 
critique. Since other of Rokeach’s 
data bear directly upon this point, 
they are worthy of reproduction. 
Rokeach gave the F scale (1, pp- 
255-257) to students at London Uni- 
versity. The respondents were asked 
to indicate their political preferences. 
Table 2 indicates the results. ; 

Completely contrary to Eysenck's 
statement about communist scores on 
the F scale, the communistically in- 
clined students at the University © 
London, where Eysenck teaches, 
scored lowest on the F scale. This 
finding is in complete accord with 
some twenty years of research on 
similar types of scales as was indi- 
cated in the critique. 

It is once more concluded that 
Eysenck's attempted equation of F- 
scale scores and authoritarianism 
based upon Coulter’s samples is com- 
pletely out of line with all available 
data including his own. 


FURTHER REMARKS 


The previous discussion has dealt 
with the specific points made in the 
critique and Eysenck’s failure to 
answer them adequately. In evading 
these criticisms, Eysenck raises many 
Points which require clarification. 
Two will be singled out for comment; 
his discussion of “pure” T-scale items 
and his use of research other than his 
Own to support his speculations. 

“Pure” T-scale items. An impor- 
tant aspect of Eysenck’s theorizing 18 
that, “TheT-factoritself . . . israther 
the projection on to the social atti- 
tude field of a set of personality vari- 
ables” (9, p. 170). As noted in the 
critique, this conclusion of Eysenck’s 
might well have resulted from the 
fact that his procedure of collecting 
items from existing scales could not 
have Possibly uncovered “pure” meas- 
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ures of T since such were not included 
in the scales subjected to pruning. 
It was suggested that, “If, instead 
of taking items which had been of 
relevance in previous research, he 
had analyzed the definition of tough- 
mindedness and then selected, in- 
vented, or modified items which ap- 
peared relevant and then factor an- 
alyzed responses to them and other 
items he might well have isolated a 
much purer dimension of ‘tough- 
tender-mindedness’"’ (4, p. 427-428). 
Eysenck apparently agrees with 
this criticism of his original pro- 
cedure since he quotes the statement 
= then says, “Having attempted 
or many years to do what Christie 
advocates, and having had several 
students make similar attempts, all 
Without success, the writer believes 
that Christie is somewhat optimistic” 
(11, p. 432). 
an examination of Eysenck’s pub- 
i work does not indicate a single 
nstance of his ever having offered an 
es of “‘tough-mindedness.” In 
is earliest work in this area the items 
Were described as forming a “‘practi- 
cal-theoretical” dichotomy and Ey- 
a later noted that the interpreta- 
Re of the factor was entirely subjec- 
E (9, p. 119). It was in the report 
3 work done upon the basic middle- 
cass sample (5) that the terms 
ptoush-” and “tender-minded” were 
Arst applied to the factor which ap- 
peared to remain after the radical- 
Conservative axis was extracted. No 
formal definition was given but com- 
Parisons between the implicit content 
of the items and some of William 
James’s comments about the tender- 
minded and the tough-minded have 
been made by Eysenck (9, p. 131). 
uch an identification is extremely 
tenuous. If we were to take Eysenck’s 
Usage of James seriously we would 
ave to conclude that communists 
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and fascists were not dogmatic (since 
dogmatism is a Jamesian tender- 
minded trait and according to Ey- 
senck communists and fascists are 
“tough-minded”’)! 

No evidence is presented in sup- 
port of Eysenck’s contention, nor 
does he indicate which of his students 
have attempted to undertake an an- 
alysis of the term “tough-minded- 
ness.” Melvin, whose scales of R 
and T have been cited by Eysenck as 
being “improved” versions (9, p. 
132), is the only student known to 
have done any other work on the 
construction of the T scale.” This is 
his description of his approach: 

The most logical way to begin this search 
would be to make a theoretical analysis of the 
concept of tendermindedness and then make 
formal deductions to a series of hypotheses 
about its verbal manifestations. This proce- 
dure was considered, but it soon became clear 
that it was difficult to arrive at any conclu- 
sions about the essential psychological nature 
of T by pure thought alone, and that a strictly 
formal approach would have to be abandoned 
(15, p. 122, italics in original). 


Melvin’s basic procedure was 
Eysenckian. He examined attitude 
scales published since 1947 in search 
of items (Eysenck had gone over 
earlier scales). In addition, however, 
he gathered items from the expressed 
opinions of minority group members 
and publications, from a political en- 
cyclopedia, and other new items were 
originated upon the basis of discus- 
sions with Eysenck. His pool of 239 
items was an adequate sampling of 
existing scales and also contained 
original material in contradistinction 
to Eysenck’s original 40 items (4, p- 


427). 


7 Melvin’s thesis (15) was mentioned in the 
critique as being unknown at the University 
of London Library. It has been filed since the 
critique was submitted for publication (per- 
sonal communication from the Univer. of 
London Library, Aug. 1955). 
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It is therefore not surprising that 
such an essentially empirical ap- 
proach, although conducted with 
methodological sophistication, led to 
the discovery of relatively few items 
which even hinted at the existence 
of pure” T items. Seven items were 
found which had negligible loadings 
on R (.08 or less) and modest lead- 
ings on T (.20 or more) (15, Ap- 
pendices A-D, pp. i-xxviii). 

In his concluding chapter, Melvin 
says, “The difficulty noted... of 
obtaining valid Tendermindedness 
scores for toughminded-radicals raises 
another urgent problem. This might 
well be approached along similar 
lines to those adopted by the authors 
of The Authoritarian Personality . . . 
in their development of the California 
F-scale” (15, p. 344), 

The inferences to be drawn from 
Melvin’s work are: (a) his approach 
was essentially empirical and did not 
follow from any rigorous analysis of 
the definition of “tough-mindedness,” 
and (b) he apparently believes that 
valid T-scale items might be obtained 
by a more theoretically oriented ap- 
proach, 

Eysenck's claim that he and his 
students have been attempting for 
years to do research based upon an 
analysis of the definition of T is not 
supported by any known published 
or unpublished material. Indeed, the 
most recent and most relevant thesis 
—done under Eysenck’s own super- 
vision—suggests that such 
proach be tried! 

It is a moot point whether or not 
“pure” T items could be uncovered, 

To repeat the point made in the 
critique, the procedures used by 
Eysenck and his students could not 
possibly have uncovered many of 
them—if they did exist—because no 
formal attempt was made to define 
what they were looking for and the 


an ap- 


selection of items was limited pri- 
marily to those used by other investi- 
gators. A 

Eysenck's use of supporting re- 
search. In his replies to Rokeach and 
Hanley and to this critic, Eysenck 
has not chosen to answer specific 
criticisms about his methodology. In- 
stead he has preferred to rely upon 
references to unpublished theses of 
his students. 

There are three studies done by 
Eysenck and his students which have 
contained comparisons between com- 
munist, fascist, and other political 
party samples’ scores upon the T 
scale. The first of these was the 1951 
study of Eysenck (7). Flaws in this 
study were pointed out and these 
were not directly answered in Ey- 
senck’s reply. ; 

The second was an unpublished 
thesis by Coulter which Eysenck 
referred to in his reply to Rokeach 
and Hanley as supporting his posi- 
tion. This critic raised questions 
about Coulter’s thesis which, if cor- 
rect, invalidated it as a meaningful 
comparison of the “tough-minded- 
ness” of communists, fascists, and 
major party members. Eysenck does 
not mention, let alone attempt to 
answer these criticisms in his reply- 
He also does not again cite this thesis 
in support of his position in his reply- 

The third study was an unpub- 
lished thesis by Nigniewitzky. In 
his reply to this critic, Eysenck 
places primary reliance upon it and 
anticipates that it might be consid- 
ered irrelevant to criticisms based 
upon earlier studies (11, p- 435). 
Such an anticipation is correct. Even 
if Nigniewitzky’s data could with- 
stand critical methodological scrutiny 
there would be no justification for the 
many errors made by Eysenck. 
Nigniewitzky’s results are correctly 
reported by Eysenck, the latter 18 


OO 


a 


SOME ABUSES OF PSYCHOLOGY 449 


placed in the position of having blun- 
dered onto the truth despite a con- 
catenation of critical mistakes. 

The temporary unavailability of 
Nigniewitzky’s thesis’ makes it im- 
Possible to determine whether it is 
relevant to the following statement 
by Eysenck, ‘‘Thus, an improvement 
in sampling procedures, as demanded 
[sic] by Christie, and an improve- 
ment in the scale used do not result, 
as would be predicted from his crit- 
Icisms, in a lessening of the observed 
differences between communists and 
the orthodox political parties..." 
(11, p. 436, his italics). 

It is unclear from Eysenck's state- 
ments what sort of a sample was 
utilized by Nigniewitzky. According 
to Eysenck, when replying to Rokeach 
and Hanley, it was “... a properly 
Stratified sample of the French popu- 
lation..." (10, p. 178). According 
to Eysenck, when replying to this 
writer it was“... a properly selected 
representative sample of the French 


è The only possible way to get copies of 
theses from the University of London Library 
Is to request that a microfilm copy be pre- 
Pared. The person requesting such a copy 
must pay for it and sign a statement that it 
will not be quoted without written permission 
of the author. This procedure was followed in 
the case of Coulter's thesis and it required 
some five months from the initial inquiry until 
the microfilm copy was received. 

In view of the short period of time available 
for a reply to Eysenck it did not appear feasi- 
ble to wait for a microfilm copy of Nignie- 
Witzky’s thesis nor was it certain that quota- 
tions would be permitted. Following Ey- 
senck's remarks in Footnote 2 of his reply 
(11, p. 437), the writer requested copies of 
both Melvin’s and Nigniewitzky's theses. A 
copy of Melvin’s thesis was graciously for- 
warded. Eysenck said that a copy of Nignie- 
Witzky’s thesis had not yet been received from 
the University of London Library. At the 
same time (March 7, 1956) Nigniewitzky was 
also sent a letter requesting a copy of his the- 
sis. No reply has been received as of the time 
this article was submitted for publication. 


middle-class population” (11, p. 435, 
italics added). 

It is also unclear from what Ey- 
senck says what improvements in the 
T scale were made and what rele- 
vance these might have to the criti- 
cisms made of the earlier version. 
The only “improved” versions of the 
T scale mentioned by Eysenck in 
other contexts are those by Melvin. 
The problem of unequal distribution 
of T-scale items in the four quadrants 
also exists in Melvin’s two scales 
since they both had ten items in each 
of the two “‘tough-minded” quad- 
rants and six in each of the two 
“tender-minded"” quadrants (15, Ap- 
pendices A-D, pp. i-xxviii). This dis- 
tribution, combined with either of 
the two scoring systems discussed by 
Melvin would not alleviate the 
sources of bias discussed in compar- 
ing communists with members of 
major political parties on the T 
scale. Unlike Eysenck, Melvin recog- 
nizes this problem and discusses in 
his thesis the then unsolved problem 
of communists responding to T-scale 
items in terms of their loading on R 
rather than T (15, pp. 219-225). 

Aside from a lack of satisfactory 
detail, Eysenck’s remarks about the 
greater differences found between 
communists and major party mem- 
bers in France than in England 
evades the issue and is completely 
irrelevant. Gross methodological 
errors in Eysenck’s and Coulter’s 
studies of British party members 
made their comparisons meaningless. 
It therefore follows that no predic- 
tion whatsoever can be made as to 
whether or not a new study (using 
proper sampling and measurement 
procedures) would show an increase 
or lessening in relative differences of 
parties along the T scale when com- 
pared with their results. : 

It would be unjust to prejudge 
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Nigniewitzky’s thesis upon the basis 
of Eysenck’s ambiguous remarks. It 
would be unwise to accept the lat- 
ter’s statements about it at face 
value, however, since Eysenck 
« ... cannot agree with any of the 
major criticisms put forward...” 
(11, p. 431) of his own work and 
nowhere indicates an awareness of 
the implications of his many method- 
ological excesses.® 
Eysenck concludes his flight from 
criticism by inviting the reader to ex- 
amine five studies using the T scale 
carried out in countries other than 
Great Britain. The suggestion is 
irrelevant as far as criticism of 
Eysenck’s procedures is concerned. 
The prospective reader should be re- 
minded that with the presumed ex- 
ception of Nigniewitzky’s study, all 
of them utilized Eysenck’s original T 
scale. Their interpretation is there- 
fore subject to all the cautions neces- 
sitated in evaluating results obtained 
with this unique “measurement” in- 
strument. 
cerned that pends ex 
is own fails a Reece than 
logical aaua ais cas 
raised about his 
own work. In those instances where 
such research has been available for 
examination, it does not support 
Eysenck but confirms the criticisms 
made or is irrelevant to them. 


Conctusion 
„Eysenck contends that commu- 
nists and fascists are more “tough- 
minded” and “authoritarian” than 
are members of major Political par- 
ties. This plausible assumption turns 
? For an amusing earlier demonstration of 


the same point see the comments by Greenal 
(12) and Eysenck’s reply (6). prall 
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out, upon critical inspection, to be 
based upon errors of computation, 
uniquely biased samples which forbid 
any generalizations, scales with built- 
in biases which do not measure what 
they purport to measure, unexplained 
inconsistencies within the data, mis- 
interpretations and contradictions 
of the relevant research of others, 
and unjustifiable manipulations of 
the data. Any one of Eysenck’s many 
errors is sufficient to raise serious 
questions about the validity of his 
conclusions. In toto, absurdity is 
compounded upon absurdity, so that 
where, if anywhere, the truth lies is 
impossible to determine. F 
It had been hoped that Eysenck's 
reply to specific criticisms would be 
directed toward acknowledging their 
relevance or rebutting them. If this 
had been done our exchange would 
have served to clarify problems and 
sharpen legitimate points of differ- 
ence. Instead, Eysenck does not rebut 
a single specific criticism. J 
Eysenck’s responses to these criti- 
cal points which he takes note of in- 
variably evade the specific issue. Re. 
liance is placed upon an extensive ci- 
tation of the research of others- 
Those that are available do not sup- 
port his position but indicate the 
cogency of the criticism. 
_ This critic rests his case. It is be- 
lieved that the detailed and perhaps 
tedious documentation of Eysenck’s 
Scientific sins of omission and com- 
Mission is sufficient to raise grave 
doubts about the validity of his 
Conclusions, The reader is invited to 
decide for himself whether or not 
Eysenck’s many methodological er- 
rors and his evasions of specific criti- 


cisms constitute abuses of psychol- 
ogy. 
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THE QUANTITATIVE STUDY OF SHAPE AND 
PATTERN PERCEPTION! 


FRED ATTNEAVE anp MALCOLM D. ARNOULT 
Skill Components Research Laboratory, Air Force Personnel and Training Research Center 


The pre-eminent importance of 
formal or relational factors in per- 
ception has been abundantly demon- 
strated during some forty years of 
gestalt psychology. It seems extra- 
ordinary, therefore, that so little 
progress has been made (and, indeed, 
that so little effort has been ex- 
pended) toward the systematizing 
and quantifying of such factors. Our 
most precise knowledge of perception 
is in those areas which have yielded 
to psychophysical analysis (e.g., the 
perception of size, color, and pitch), 
but there is virtually no psycho- 
physics of shape or pattern. 

Several difficulties may be pointed 
out at once: (a) Shape is a multidi- 
mensional variable, though it is often 
carelessly referred to as a “dimen- 
sion,” along with brightness, hue, 
area, and the like. (6) The number 
of dimensions necessary to describe 
a shape is not fixed or constant, but 
increases with the complexity of the 
shape. (c) Even if we know how 
many dimensions are necessary in a 
given case, the choice of particular 
descriptive terms (i.e., of reference- 
axes in the multidimensional space 
with which we are dealing) remains 
a problem; Presumably some such 
terms have more Psychological mean- 
ingfulness than others, 


1 This research was carried out at the Skill 
Components Research Laboratory, 
Personnel and Training Research Center, 
Lackland Air Force Base, San Antonio, 
Texas, in support of Project 7706, Task 27001. 
Permission is granted for reproduction, trans- 
lation, publication, use, and disposal in whole 


or in part by or for the United States Govern- 
ment. 
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Air Force 


The need for an adequate psycho- 
physical framework is most obvious 
in those studies (having to do with 
discrimination, for example, or with 
positive or negative transfer) in 
which it is necessary to manipulate 
shape or pattern as an independent 
variable. Unless some meaningful 
units of variation are specifiable, 
functional relationships cannot be 
obtained. It is somewhat less obvi- 
ous, but nonetheless true, that a com- 
parable need exists in experiments 
which seek to determine how form 
perception is influenced by extrinsic 
variables such as size, contrast, 
method and degree of familiarization, 
etc. In studies of this sort, the ex- 
perimentercommonlyusessomesmall, 
arbitrarily chosen set of stimuli: 
sometimes simple geometrical forms; 
sometimes a group of “nonsense 
shapes which he draws in a more or 
less haphazard manner. If the results 
obtained are “significant” in the 
usual sense, we have some specifiable 
degree of confidence that they are gen- 
eralizable to people other than those 
used as subjects, but the degree to 
which they are generalizable to new 
stimuli remains a matter of conjec- 
ture. Yet the latter kind of generaliz- 
ation is no less important than the 
former. Only in rare cases of applied 
Tesearch is the investigator really 
Content with results which hold only 
for the particular stimulus objects 
employed experimentally. 

Egon Brunswik (9, 10, 11) is per- 
haps the only psychologist who has 
ever given due weight to the im- 
Portance of stimulus-sampling, or © 
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situation-sampling in general. Al- 
though the approach of this paper 
is somewhat different from Bruns- 
wik’s, for reasons which are devel- 
oped below, we wish to acknowledge 
freely Brunswik’s influence upon our 
own thinking, and to commend his 
writings on this subject to any reader 
unacquainted with them. Brunswik 
takes the reasonable position that re- 
sults with “ecological validity” may 
be obtained only by the use of experi- 
mental materials which are drawn 
from, and hence representative of, 
the real situations to which one 
wishes to generalize. Thus, in the 
study of shape perception, it would 
be desirable to experiment with the 
shapes of natural objects. Suppose, 
however, that we wish to investigate 
the learning and memory of shapes 
with which subjects are initially un- 
familiar: the requirement of unfamili- 
arity will obviously preclude the ex- 
perimental use of shapes which are 
commonly encountered. Is there any 
sensible procedure for choosing stim- 
ulus-materials in this sort of situa- 
tion? 

It is our belief, at this time, that 
the problem of generalizing from ex- 
perimental stimuli may profitably 
be broken into two parts. First, 
there is the problem of specifying the 
Stimulus-domain, i.e., the problem 
of drawing a sample of stimuli from 
a parent population characterized by 
certain determinate statistical pa- 
rameters. The stimulus-domain, or 
Parent population, includes all those 
stimuli to which the results may be 
generalized, and is defined by the sta- 
tistical parameters which characterize 
it. In the following section we shall 
indicate a variety of particular meth- 
ods for drawing “random” patterns 
and shapes from such clearly defined 
hypothetical populations, to whic 
experimental results may then be gen- 
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eralized with measurable confidence. 

The second problem, which is 
really a special case of the first, is 
that of drawing a sample which has 
“ecological validity.” If our real aim 
is to generalize to natural forms, or 
to some subset thereof, it is neces- 
sary to estimate the psychologically 
important statistical parameters of 
these natural forms in order that ex- 
perimental materials may be con- 
structed to possess the same parame- 
ters. Thus, we are brought back to 
the acute need for a general psycho- 
physics of form. In the final section 
we shall discuss the kinds of physical 
analysis and measurement which ap- 
pear appropriate to such a psycho- 
physics. 


Tue CONSTRUCTION OF STIMULI 


All the methods described below 
for constructing nonsense shapes and 
patterns have in common the fact 
that the particular characteristics of 
each figure are randomly determined. 
Each method is, in effect, a set of 
rules by which points are plotted and 
connected in accordance with values 
obtained from a table of random 
numbers. Each method, or set of 
rules, thus determines a domain of 
stimuli. The stimuli actually con- 
structed for use in a given experiment 
will, if they are all constructed ac- 
cording to the same rules, be a ran- 
dom sample of the stimulus-domain 
defined by the set of rules. The ex- 
perimental results, consequently, may 
be generalized both to the entire 
stimulus-domain and to the appropri- 


ate subject population.? 


of double-generalization pro- 
Id require an error term which 
included the variance due to subjects, the 
variance due to stimuli, and the interaction 
between them. In what is perhaps the most 


obvious analysis-of-variance design, the sub- 
{ments mean square would 


r term to use, 


2 The kind 
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The experimenter who desires to 
use stimuli constructed in this man- 
ner must determine what set of rules 
will provide him with a stimulus 
population having the _character- 
istics he wants. If one desires to gen- 
eralize experimental results to the 
world of real objects (chairs, air- 
planes, people, etc.), it is necessary 
to have a stimulus sample Possessing 
ecological validity. To construct 
nonsense stimuli of this sort one 
must know the pertinent parameters 
of the stimulus-domain of real ob- 
jects and use these parameters in 
constructing the experimental stim- 
uli. In the next section we shall dis- 
cuss some of the problems inherent 
in this methodological requirement 
and some of the attempts which have 
been made to solve them. 

In the present section, some gen- 
eral methods for constructing stimuli 
are described in sufficient detail that 
the reader, if he desires, may repeat 
the operations in order to develop 
additional stimuli belonging to the 
various stimulus-domains defined by 
the methods, It should be kept in 
mind, however, that these methods 
are described merely as examples 
and are Not intended to Constitute a 
comprehensive catalog of all possible 

Tiptions will be given 
or generating shapes 
losed or open contours, 
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THE CONSTRUCTION OF A 
TO METHOD 1 (SEE Text) 


for generating various kinds of pat- 
terns, and for introducing systematic 
variations or transformations of 
shapes or patterns. 


Closed Contours—Angular Shapes 


Method 1. Starting with a sheet of 
graph paper—say 100 X 100—succes- 
sive pairs of numbers between 1 and 
100 are selected from a table of ran- 
dom numbers. Each pair will deter- 
mine a point which can be plotted on 
the 100X100 matrix. The total num- 
ber of such points to be plotted can 
be determined either randomly or 
arbitrarily. 

When all the points have been 
plotted, a straightedge is used to 
connect the most peripheral points 
in such a way as to form a polygon 
having only convex angles. This 
operation will usually leave some un- 
connected points within the polygon 
(Fig. 1a). When a point falls within 
Some small, arbitrarily chosen dis- 
tance of the proper perimeter (e.g-1 
the point between segments 7 and 8 
in Fig, 1a) it is included even though 
it makes a slightly concave angle, 
since otherwise an indentation prac- 
tically dividing the shape into two 
Parts might later occur. The sides of 
the polygon are numbered, and the 
points remaining inside are assigned 
letters. The table of random numbers 
is then used to determine which of the 
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central points is connected to which 
side. In the example given, Point C 
Was connected to Side 2, forming in 
the process Side 10 (Fig. 1b). At this 
Stage in the construction, the possi- 
bilities of connecting points have 
been changed. Point A may now be 
taken into Sides 3, 4, 5, 6, 7, 8, or 10, 
but not into Sides 1, 2, or 9. Point B 
may be connected only to Side 2 or 
Side 10. If Point A is connected to 
Side 5, forming new Side 11, there 
remains only the possibility of con- 
necting Point B to Side 2 or Side 10 
a Fig. 1b). Connecting Point B 
D Side 10 completes the shape, which 

nally appears as shown in Fig. 1c. 
_ It will be noted that every step 
in the procedure is determined either 
randomly or by the elimination of all 
Other possibilities. Furthermore, 
every step is completely determinate 
and can be duplicated by anyone us- 
ing the same rules and the same selec- 
eae from the table of random num- 

Method 2. This method of con- 
Structing random shapes is also 
Started by plotting successive pairs 
of random numbers as coordinates on 
graph paper. As each point is plotted 
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INCOMPLETE CONSTRUCTION 


SHAPE CONSTRUCTED BY 


DEMONSTRATING PERMISSI 
= COMPLETED-SHA PE nn 


Bureau Edni. “sy. Research 


Connections; b. fs 


GAVID Hack FnAIRING COLLEGE 


it is given a number so that even- 
tually all are numbered serially. 
These points are then connected in 
the order in which their serial num- 
bers first appear in a table of random 
numbers, except that numbers which 


violate certain rules of construction 


are rejected. The incomplete con- 
struction shown in Fig. 2a will pro- 
vide examples of permitted and non- 
permitted connections. The rules for 
connecting points are as follows: 

a. No line may be drawn twice. 
Assume, in Fig. 2a, that the last line 
drawn was from Point 2 to Point 5. 
If the next number in the table were 
2, it would be rejected since that con- 
nection has already been made. 

b. No line may be drawn which 
completely encloses a point within 
the perimeter of the figure. From 
Point 5 it would not be permissible to 
draw a line to Point 6 or to Point 4, 
since either action would completely 
enclose Points 3 and 8. 

c. No two points may be directly 
connected if they are already con- 
nected by a path which follows per- 
imeter lines without passing through 
any other plotted points. For ex- 
ample, Point 5 may not be connected 


100 
80 
60 
40 


20 
(b) 


o 20 


THE RULES OF METHOD 2: a. 
BLE AND NONPERMISSIBLE 


40 60 80 100 


——— 


456 


to Points 3 or 7, Point 3 may not be 
connected to Points 5 or 6, and Point 
2 may not be connected to Point 4. 

d. The figure is complete when 
each point has been connected to at 
least two other points. It sometimes 
happens that the table of random 
numbers leads one to a point which 
already has all the other connections 
allowed it. In this case one of the 
other points is chosen randomly as a 
new origin and the regular process is 
continued. The incomplete shape of 
Fig. 2a is shown in a completed form 
in Fig. 20. 

As is the case with all the methods 
described in this paper, this method 
is completely objective. The result- 
ing figure could be reproduced, if 
necessary, from a set of coded instruc- 
tions consisting only of the numbers 
originally selected from the table, 

Unlike Method 1, Method 2 usu- 
ally generates shapes containing some 
angles in addition to all those at 
originally plotted points, This dif- 


_ ference is emphasized 
eed zed by Rule ¢ of 


osed, and (8) 
ot cross, i.e., 


5 y have 
only at the original points. eges 


! 
| Method 2, on the other hand, 
_ there may be emergent” angles at 
| places other than originally plotted 
points, and the figures produced tend 
to be characterized by “good continu 
ation.” Again, it is Rule ¢ of Method 
2 which causes many of the perimete 
lines of the final figure to be continu : 
tions of other perimeter lines. “ 
Comparing the two methods in 
terms of the informational content 
of the shapes produced shows that 
in Method 1 information (in addition 
to that required to locate the Original 
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points) is used only in connecting the 
interior points to the sides of the 
original perimeter, whereas in Method 
2 information is used in making all 
connections between plotted points. 
For this reason a Method 2 shape 
composed of 2 original points and 
containing 2++k angles (k represent- 
ing the number of “emergent” points) 
will contain more information than a 
Method 1 shape composed of n origi- 
nal (and final) points. Because of the 
good continuation introduced into 
the figure, however, the Method 2 
shape having n+% points will contain 
Jess information than would a Method 
1 shape having +k original points. 

Method 3. Fitts, Weinstein, Rap- 
Paport, Anderson, and Leonard (15) 
have developed a technique for con- 
structing “metric” figures, the in- 
formational content of which may be 
easily and accurately determined. 
Starting with a somewhat smaller 
matrix—say, 8X8—the number of 
cells to be filled (from the bottom up) 
in each column of the matrix is ran- 
domly determined. This method pro- 
duces shapes which belong to a rela- 
tively small stimulus-domain and 
which are equal in informational con- 
tent. A variation of this method in- 
volves allowing each possible column- 
height to appear only once in each 
shape, with the order of appearance 
determined randomly. This second 
Stimulus-domain contains members 
which are equal in area and, conse- 
quently, contain less information 
than the shapes first described. Still 
another variation may be introduced 
by reflecting each shape on one of its 
axes to produce a symmetrical shape 
containing no more information than 
its nonsymmetrical predecessor. Ex- 
amples of these various classes of 


metric figures may be found in Refer- 
ence 15, 
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Fic. 3. METHOD FOR INTRODUCING 


SHAPE, THE ORIGINAL SHAPE IS THE SAME ONE 


Closed Contours—Curved Shapes 


Method 4. This method describes 
a procedure for making wholly or 
Partially curved shapes from the 
angular shapes constructed by 
Method 1 or 2. This procedure may 
appear to be somewhat involved, but 
actually it requires more time to de- 
scribe than to perform. Essentially, 
it consists merely of replacing angles 
with inscribed arcs, of curvature 
chosen randomly within limits im- 


posed by the figure. 
For purposes of demonstrating the 


URANDOM” CURVES INTO AN ANGULAR NONSENSE 
WEICH APPEARED IN FIG. 1c 


method, let us start with the shape 
described and constructed under 
Method 1 (Figs. 1a-1c). It is de- 
cided (arbitrarily or randomly) that 
four of the twelve angles are to be 
curved. Let us suppose that Angles 
C, F, J, and K (Fig. 3) are chosen. 
(For convenience of exposition the 
angles have been assigned the letters 
A through L.) The first step in the 
ts in constructing line 
Cp, which is the bisector of LBCD. 
Then, the shorter of the two arms of 
the angle (in this case, line BC) is 


process consis 


458 FRED ATTNEAVE AND MALCOLM D. ARNOULT 


divided into equal units. These units 
may be chosen for convenience. For 
example, Fig. 3 was constructed on a 
100X100 matrix having matrix units 
equal to 0.20 in., and Line BC was 
arbitrarily divided into segments of 
0.25 in. each. It should be noted that 
the divisions of the line are num- 
bered in sequence, starting always 
from the apex of the angle. 

One of these numbered points on 
line BC is now chosen at random and 
a perpendicular from Line Cp to it 
is constructed (Line 5-g). This line 
(5-g) now becomes the radius of an 
arc which is inscribed within Z BCD. 
The arc is tangent to Line BC and 
Line CD at points equidistant from 
C. Thus, ZBCD has now been re- 
placed by a curve (actually, two lin- 
ear segments and an arc) going from 
B to D. 

Point F has been curved by the 
same process. Angle EFG is bisected 
by line Fr, and Line FG is divided 
into equal segments, Division 8 hav- 
ing been chosen at random, line 8-s 
1s constructed and used as a radius 
for inscribing a curve within Z EFG, 

The next two constructions dem- 
onstrate the complex curvature which 
may result when successive points 
are chosen to be curved. Point J 
is curved by the process described 
above, with Line 13-u being used as 
the radius of an arc inscribed within 
ZIJK. However, in curving Point 
K it-is necessary to inscribe an arc 
within Z J'KL, not within ZIKI 
Point J’ is the point at which the arc 
constructed with radius 13-u be- 
comes tangent to line JK, 

If it is so desired, all the Points of 
an angular figure may be curved. It 
should be noted, however, that the 
shorter arm of every angle is divided 
into segments, and that its divisions 
are numbered beginning with zero. If 

the zero is the random choice, the re- 


sulting curve will have zero radius, 
i.e., that angle remains as originally 
drawn. 

Method 5. Angular shapes can be 
changed into curved shapes by a pro- 
cess of photographic blurring. The 
figure is first photographed and then, 
with the help of an enlarger, 1S 
printed out-of-focus on high contrast 
paper. The resulting image has a con- 
tour which is curved, but which is 
also graded in density. A repetition 
of the process of photographing and 
printing, however, will eliminate the 
density gradient, producing a shape 
with contours which are rounded and 
well-defined. The amount of blur 
may, of course, be carefully con- 
trolled, and a graded series of curved 
shapes may be made from a single 
Prototype shape. 


Open Contours. ' 


Method 6. There are many ways 11 
which open-contour nonsense shapes 
may be constructed from a table of 
random numbers, but all that we 
have used have been variations on 
one basis method. Starting from une 
approximate center of a matrix of 
Convenient size, a line is drawn to one 
of the eight intersections nearest the 
Starting point. These eight intersec- 
tions (or, more generally, directions) 
have been assigned numbers as shown 
in Fig. 4a. The intersection on the 
graph paper at which the first line 
terminates becomes the origin for 
the second line to be drawn, and ie 
on. A difficulty with this method is 
that there is no intrinsic criterion for 
completeness in such a figure. One 
objective rule is to determine, before 
beginning the construction, the total 
number of digits to be selected from 
the table and to consider the figure 
complete when that number of lines 
has been drawn. 

Many variations on this basic tech- 
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nique may be introduced. For ex- 
ample, for some purposes it may be 
desired to allow only four directions 
in which the contour may vary; also, 
the length of each line may be deter- 
mined randomly as well as the direc- 
tion. Partially or wholly curved con- 
tours may be produced by this 
method as follows: the radius of cur- 
vature of the arc drawn to connect 
Successive intersections along the 
horizontal and vertical axes of the 
Matrix is set as one-half the length 
of a matrix unit. To connect two 
intersections diagonally separated, 
the arc would have a radius equal to 
One matrix unit. Thus, for example, 
One might determine randomly for 
each line constructed: (a) which two 
intersections will be connected, (b) 
whether the connection is to be linear 
or curved, and (c) the direction of 
Curvature. Figure 4b was drawn by 
this technique. Additional variations 
on these methods may be provided 
by using semi-log, log-log, or polar 
coordinate matrices on which to con- 
struct the nonsense contours. 


Patterns 


Method 7. Although the more obvi- 
ous ways of generating random pat- 
terns have been used by a number of 
Investigators, the possibilities of this 
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approach to the construction of com- 
plex visual displays have never been 
adequately explored. In general the 
practice has been to construct a 
matrix of some given size and then to 
determine randomly which cells are 
to be filled. Patterns of dots were 
constructed in this fashion by Kauf- 
mann, et al., (19), French (16), and 
Klemmer and Frick (20), for exam- 
ple. Attneave used the same ap- 
proach, including the introduction of 
asymmetry factor, in a study of the 
effect of redundancy on memory for 
patterns (4). In another slight varia- 
tion Arnoult used random shapes as 
elements in constructing random pat- 
terns for use in a learning experiment 
(2). Patterns generated in this fash- 
ion are very attractive as stimuli be- 
cause it is usually possible to com- 
pute fairly precisely the informa- 
tional content of the display. 


Systematic Variations 


Frequently it is desired to con- 
struct “families” of shapes having 
known physical relationships among 
the individual members. Again, there 
are many possible techniques for ac- 
complishing this end. The following 
two methods represent two kinds of 
systematic variations which have re- 
cently been used. 

Method 8. A prototype shape is 
constructed by any of the methods 
so far described. Then, each point 
is moved to a new location and the 
connecting lines redrawn as before. 
In moving the points, any of the fol- 
lowing parameters may either be 
held constant or varied randomly: 
(a) the number of points moved, (b) 
the particular points moved in mak- 
ing successive variations on the same 
prototype, (e) the distance through 
which a point is moved, and (d) the 
direction of movement. A number of 
variations made from a given proto- 
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type will form a distribution of shapes 
which “vary about” the prototype. 
Stimuli of this sort were used re- 
cently by Attneave in testing the 
hypothesis that knowledge of the 
prototype shape, or ‘‘schema,” would 
facilitate discrimination of the varia- 
tions in paired-associate learning (6), 
and by Arnoult in a study of the ef- 
fect of predifferentiation training 
on recognition (1). A typical proto- 
type shape and its variations are 
shown in Fig. 5, 
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Method 9. A somewhat different 
technique for creating “families” 
shapes has been dey 
ford by LaBerge and Lawrence (23) 
Initially, a random shape is con- 
structed by a method essentially 
the same as those described in 
Method 1 and Method 2 (actuall 
LaBerge and Lawrence simply ae 
nected randomly chosen Points into 
the polygon of minimal perimeter) 
Then, each point on the contour is 
assigned randomly chosen “y” and 
`Y Increments to its coordinates 
and these new coordinates are plotted 
and connected on a fresh matrix 

increments are then 


of 
eloped at Stan- 


These same 


MALCOLM D. ARNOULT 


added to the new coordinates and a 
third figure is constructed. This pro- 
cess may be continued until one has 
constructed a row of, say, six figures, 
each differing from its immediate 
neighbors by a constant amount of 
distortion as measured by the dis- 
tance through which the points move. 
The next step is to label the former 
“x increments as “ys” and the 
former “y” increments as “xs.” 
These new increments are added to 
the coordinates of the points of all six 
of the figures already constructed, 
and the process of constructing suc- 
cessive shapes is repeated until there 
is a column of six shapes for each of 
the original six shapes. The final re- 
sult is a matrix of 36 shapes in which 
any two adjacent shapes in a row Or 
column are equally spaced in terms 
of the average distance the points 
have moved. Matrices of stimuli 
of this sort are currently being used 
by LaBerge and Lawrence in studies 
of transfer. 

As has been emphasized a number 
of times in the preceding discussion, 
these methods for constructing ‘“‘ran- 
dom” shapes are only a few which 
have been selected to show some of 
the classes of shapes which can_be 
constructed. The number of differ- 
ent sets of rules which can be de- 
veloped for plotting and connecting 
points taken from a table of random 
numbers is limited only by the fer- 
tility of the individual experimenter’s 
imagination. It should be reiterated, 

owever, that using stimuli con- 
structed by these “random” methods 
does not insure that the generaliza- 
tions resulting from the research wi 
be pertinent to all other kinds 9° 
visual stimuli. It guarantees only 
that the results will be generaliz- 
able within a particular stimulus- 
domain, i.e., to any other stimuli co” 
structed by the same rules. 
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_ ANALYSIS OF NATURAL FORMS 


Let us now return to a problem 
which the methods discussed in the 
Previous section by no means obvi- 
ate. We still need a technique, or a 
set of techniques, by means of which 
physical measurements of a psycho- 
logically relevant sort may be ob- 
tained for forms which we have not 
constructed ourselves. Any method 
of “random” construction must em- 
ploy some set of rules, either arbi- 
trary or otherwise, and these rules 
will strictly determine the class-char- 
acteristics, or statistical parameters, 
of the shapes constructed. We 
should like to be able to devise rules 
such that our synthetic shapes might 
possess the statistical characteristics 
(but not the familiarity) of natural 
shapes to which we wish to gen- 
eralize. At present, we lack not only 
a factual knowledge of the values of 
these statistical parameters, but also 
a methodology to guide us in their 
determination. Likewise, when some 
experimental variation of form is 
found to produce a certain effect in 
the laboratory, it is necessary that 
the variable in question be identifia- 
ble and measurable outside the labo- 
ratory if the results are to be gen- 
eralized. Unfortunately, however, it 
is much harder to measure form than 
to manipulate it. 

Relatively few scientists have seri- 
ously applied themselves to the prob- 
lems of analyzing and describing 
form; these problems seem to have 
fallen into the cracks between sci- 
ences, and no general quantitative 
morphonomy has ever developed. 
D’Arcy Thompson's Growth and Form 
(27) is virtually the only major work 
In the field: it is a fascinating and 
impressive book, but its contribution 
to the identification of psychophysi- 
cal variables is limited. Rashevsky, 
whose work in mathematical bio- 
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physics is in some respects a continu- 
ation of Thompson’s, has been more 
directly concerned with psychologi- 
cally relevant measures of form. 
Abstraction of contour. Considering 
that the first step in the analysis of a 
shape is the abstraction of its con- 
tour, Rashevsky (25, p. 449) devised 
a simple hypothetical nerve-net with 
this function. Suppose that the stim- 
ulation of the retina is projected to 
some central area as an activity of 
sharply localized excitatory fibers 
and of inhibitory fibers slightly more 
diffuse in their projection. If certain 
constants of the system have proper 
values, excitation from any area of 
uniform brightness will be sup- 
pressed, except at a contour where 
such an area is bounded by a darker 
one which provides less inhibition. 
This nerve-net has a fairly close 
analogue in the following photo- 
graphic process. A negative and a 
positive transparency, separated by 
a thin plastic sheet, are precisely 
superimposed so that they “cancel” 
each other when viewed from a right 
angle. A print is made by transmit- 
ting light from a diffuse source (e.g., 
the ground glass of a contact printer) 
through the superimposed positive 
and negative toa high-contrast paper 
placed in contact with the negative. 
In the case of a black object on a 
white ground, or vice versa, light 
can angle through both positive and 
negative only at the contour, and 
the resulting print is indistinguish- 
able from an outline drawing of the 
object. In the case of more complex 
pictures, the abstraction of sharp 
brightness-gradients preserves — tex- 
ture, as well as contour: this is illus- 
trated clearly in Fig. 6. A picture ob- 
tained in this way may be thought of 
as a differential (with respect to 
brightness) of the original, involving 
a “delta” of finite magnitude. If 
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a smaller “delta” had been taken in 
the derivation of Fig. 6 (by reducing 
the space between the superimposed 
Positive and negative), the iris and 
Pupil of Thompson's eye, for ex- 
ample, would appear in outline in- 
Stead of as a black dot. 

In 1948 one of the authors (Att- 
neave), in collaboration with John 

: Stroud, attempted to develop 
this photographic technique to a de- 
ree of precision such that the total 
reflectance of the differential picture 
might serve as an index of the com- 
Plexity of the original. That attempt 
nas, Unsuccessful for several reasons, 
naving to do chiefly with the unrelia- 

ility of photographic operations: 

€.g., the initial step of making a posi- 
tive and a negative which would ade- 
quately cancel always required con- 
Siderable cut-and-try. It may be 
added that the process is a close rela- 
tive of one which has long been used 
P Produce a “‘bas-relief’’ effect, and 

at the Eastman Laboratories have 
recently employed a similar tech- 
nique with color film to obtain photo- 
8taphs which look remarkably like 
Paintings, 

An electronic device lately de- 
veloped by Kovasznay and Joseph 
at the National Bureau of Standards 
appears to accomplish much the same 
Tesult as the photographic process 
described above, but ina manner sub- 
Ject to more precise control. The 
beam of a cathode ray tube, moving 
1n a complex scan which covers the 

eld in two orthogonal dimensions, 
transmits light through a photo- 
Sraphic transparency to a photo- 
electric cell. The electrical signal thus 
Senerated isdifferentiated and squared 
electronically, and then fed into a re- 
Ceiving scope where it modulates a 
beam synchronized with the trans- 
Mitting beam. [Illustrations of the 
results, which are presented in the 
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descriptive note of Kovasznay and 
Joseph (21), could be mistaken for 
the efforts of a somewhat naive artist. 

A group of engineers in the Lin- 
coln Laboratory of M.I.T., including 
Oliver G. Selfridge, Gerald P. Din- 
neen, and Marshall Freimer, are cur- 
rently experimenting with the use of 
digital computers to perform opera- 
tions relevant to object identifica- 
tion. They have been successful in 
programming a contour-abstracting 
operation; this is preceded by an av- 
eraging operation, which rids the fig- 
ure of irrelevant detail, and followed 
by an operation which abstracts 
angles, or regions of high curvature, 
from the contour (26). 

The mere abstraction of contour, 
whether by an objective process or 
with the aid of the experimenter’s 
own perceptual machinery, does not 
in itself constitute quantification. It 
does, however, contribute to the iso- 
lation of that which is to be quanti- 
fied: i.e., form. Whenever we speak 
of form, we are referring to a some- 
what vague set of properties which 
are invariant under transformations 
of color and brightness, size, place, 
and orientation; our definition may 
or may not be extended to specify 
invariance under projective (or per- 
spective) transformations. Contour 
is characterized by invariance under 
color and brightness transformations. 
Attneave (3) has previously pointed 
out the related (though not equiva- 
lent) fact that contours are regions 
of relatively high informational con- 
tent. 4 
Analysis of contour. There are vari- 
ous practical reasons for wishing to 
be able to describe a contour in terms 
which are independent of its size, 
place, and orientation. For example, 
subjects are often required to draw 
figures from memory: such drawings 
cannot be fairly evaluated by any 
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simple method of superimposing a 
drawing upon the original and meas- 
uring deviations, because of differ- 
ences in scale, etc. If both the origi- 
nal and the reproduction could be 
represented in terms descriptive of 
form alone, they could then be com- 
pared objectively. 

Such a representation may take the 
form of a single function. If the re- 
ciprocal of the radius of curvature of 
a closed contour is plotted against 
distance along the contour, a peri- 
odic function results. This function 
may be normalized (i.e., rendered in- 
dependent of the scale of the original 
figure) by assigning a value of unity 
to the perimeter of the figure and ex- 
pressing radius of curvature in com- 
parable terms, or by setting equal to 
unity the area under one period of 
the function. An angle is represented 
by a vertical line which rises (or falls, 
in the case of a concave angle) to in- 
finity; a spike of this sort, of infinite 
height, infinitesimal width, and de- 
terminate area, is the so-called ô- 
function of Dirac, and is amenable to 
mathematical treatment.? 


ı the follow- 
used. Imagine a 
guided over g 


tour being followed. Wherever an 
angle occurs in the contour, the angle 


3 This system of representation 
veloped in considerable detail 
Strauss (personal communication), 


has been de- 
by Oliver 


0 of the front wheel will be 90°; thus 
the function will always have some 
value between plus and minus 90°. 


Radius of curvature, 7, is related to” 


0 by the equation r=L* cot 6, in 
which L is the distance between the 
front and rear wheels. Normalizing 
may be accomplished by giving the 
perimeter of the figure unit value, and 
setting L at some standard fractional 
value. If L is made to equal 1/2r, 
regular polygons will be represented 
by square waves regularly alternat- 
ing between 0 and 90°, a circle will 
become a horizontal line with an 
ordinate of 45°, and certain other 
regularities will be uniquely repre- 
sented; this value of L is somewhat 
large for convenient use with more 
complex shapes, however. The in- 
terested reader will have little diffi- 
culty in working out further details 
of the system. Tt has the advantage 
of specifying an actual measuring de- 
vice which is practical and simple to 
Construct. Automatic recording of 
the function could be arranged with 
two pairs of selsyns: one translating 
the rotation of the front wheel into a 
movement of the recording paper; 
the other coupling the angular posi- 
tion of the front wheel with the posi- 
tion of a recording pen. 

Both of the functions just de- 
scribed have a serious disadvantage- 
Suppose we wish to compare two 
shapes which have a part-to-part or 
Part-to-whole similarity—say, the 
Outline of a cow's head with the out- 
line of a whole cow. The normalizing 
factors which will be employed on 4 

asis of perimeter or area will obvi- 
ously not be such as to give compara- 
ble representation to the similar por- 
tions of the outlines. 

The method next to be described 
avoids this difficulty, though it is not 
Without limitations of its own. In- 
Stead of describing the contour by 
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means of a continuous function, we 
may attempt to analyze it into parts 
which are individually homogeneous, 
and hence amenable to approximate 
description in terms of a few stand- 
ardized dimensions. It is usually 
Possible to construct a polygon about 
a figure made up of complex lines 
and curves, as in Fig. 7, by drawing 
tangents (a) at points of zero curva- 
ture (e.g., CD, IJ, etc.: whenever a 
curve changes from concave to con- 
vex, it must have an intermediate 
Point of zero curvature), (b) at points 
of minimal curvature, where a de- 
Crease in curvature is followed by 
an increase (e.g., FG), and (c) at dis- 
Continuities of slope, or angles (e.g., 
AB, GH, etc.). The series of lines 
thus formed may be described simply 
by stating the slope and length of 
each line in succession, but this de- 
Scription is peculiar to a given orien- 
tation and size of the figure. It may 
be rendered orientation-free and scale- 
free by specifying instead, for each 
pair of adjacent segments, (a) the 
change in direction (in degrees), and 
(b) the change in the logarithm of 
length, as the contour is followed in a 
clockwise direction. Curves are 
treated as ‘rounded-off” angles: i.e., 
a curve is approximated by an arc 
located tangent to two successive 
lines of the polygon we have been 
discussing. In most cases, the size of 
the arc will be limited by the length 


4 Several other possible pairs of coordinates 
convey the same information. What is re- 
quired, essentially, is to describe the shapes 
of successive segments of the polygon, taken 
in pairs. Measures of any two angles of such 
a triangle, or any two ratios of sides or differ- 
ences between logarithms of sides, or any 
combination of an angle and a comparison of 
sides, is adequate to specify the shape of the 
triangle. The combination above is chosen 
for its intuitive appeal; also because errors of 
measurement have a more uniform effect on 
these coordinates than on certain others. 


of the shorter of the two segments. 
Hence curvature is conveniently ex- 
pressed by a third coordinate speci- 
fying (c) the proportion of the distance 
between the apex of the angle and the 
end of the shorter segment at which the 
arc best approximating the curve is 
tangent. This coordinate will usually 
have some value between 0 and 1.0, 
with 0 indicating an abrupt angle 
(radius of curvature equal to zero) 
and 1.0 indicating an arc which is 
tangent to the shorter segment 
at its end. In the case of Fig. 7, 
for example, (c) would have a 
value of 0 at A and M, a value of 
1.0 at G, and a value of about .8 at C 


Fic. 7. ILLUSTRATION OF METHOD FOR 
QUANTIZING IRREGULAR CONTOUR 


(note that the arc best approximat- 
ing a curve will not necessarily have 
the same point of tangency as the 
original curve). When the arc ap- 
proximating a curve turns through 
more than 180°, as in the case of the 
bulbous projection in the JKL re- 
gion, the value of (c) will not remain 
between 0 and 1, since some of the 
points of tangency are on extensions 
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far from one another as is possible in 
a closed contour, and jagged or sinu- 
ous shapes would have low values 
(see Householder, 18). The neuro- 
logical terms in which this model is 
presented need not be taken too seri- 
ously; Rashevsky’s basic idea might 
equally well be applied to the pro- 
gramming of a man-made computer, 
or to a series of photographic opera- 
tions. 

Deutsch (12) has recently sug- 
gested a model for shape perception 
which is somewhat akin to Rashev- 
sky’s. Since it may be described very 
simply in terms of geometrical con- 
cepts, we shall ignore the neural 
mechanisms which Deutsch proposes 
as its basis. Suppose that a perpen- 
dicular is drawn to a closed contour 
at every point along its length. Each 
such perpendicular will contain a 
segment which lies inside, and is 
bounded by, the contour. The 
lengths of these segments will have 
some distribution which will depend 
upon the shape of the contour; this 
distribution may be rendered size- 

pressing the length 
as a proportion of 
Ontour. In the Case 


supple- 
hanism, 
tour-fol- 
ats have 
ating a 
om a tri- 
: hat regu- 
lar polygons with even numbers of 
sides should be more difficult to dis- 
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criminate from one another than 
from odd-sided polygons. 

Merely to order shapes along a 
compactness-dispersion continuum 
requires nothing so elaborate as the 
Rashevsky model outlined above. 
The relationship of the perimeter of a 
shape to its area provides an attrac- 
tively simple means of measuring 
this characteristic. The quotient 
P/A, which has been employed by 
some investigators (8, 17), is unsatis- 
factory from our standpoint because 
it varies with size as well as with 
shape, but either P?/A or P/VA is 
size-invariant. These ratios may be 
transformed in various ways to suit 
the user’s convenience; e.g., the meas- 
ure 


2/7 A 
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D=1 


expresses dispersion as some number 
between zero and one, assigning zero 
value to the most compact figure pos- 
sible, the circle. Dispersion (as meas- 
ured by any such relationship of 
Perimeter to area) is not the same as 
complexity (in the sense of number of 
Parts). Although a deeply convoluted 
or jagged figure will indeed tend to 
have a high dispersion value, so will 
a very thin rectangle or ellipse. 
Bitterman, Krauskopf, and Hoch- 
berg (8, 22) have found that under 
Conditions of low illumination oF 
short exposure, shapes are perceive 
in much the same way as if they were 
Physically diffused, or blurred. These 
experimenters created a physical dif- 
fusion model by cutting filter paper 
into various shapes and impregnat- 
ing it with an inhibitor of bacteria 
growth. This inhibitor was then al- 
lowed to diffuse from the paper into 
bacterial cultures. The shapes which 
most resembled each other after dif- 
fusion were those most often confuse 
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under adverse viewing conditions. 
Likewise, identification of impover- 
ished stimuli was most impaired in 
the case of shapes characterized by 
relatively ‘small detail, which would 
be averaged out in a diffusion process. 
_ These findings are interesting and 
important, but the clumsy and some- 
what bizarre bacterial model does 
not lend itself to quantitative predic- 
Hon, | There is no apparent reason 
why it might not be replaced with a 
model employing optical blur, in 
which case diffusion would be meas- 
ay by the radius of the blur circle. 

n image may readily be blurred to 
a measurable degree in an ordinary 
Photographic enlarger, and then re- 
Sharpened by means of high-contrast 
kaper or film (cf. Method 5 under 
ae Construction of Stimuli”). This 
oip ening process introduces an- 
ara parameter, that of the black- 
He ite threshold to be used in print- 
e j It is easiest photographically 
dei to employ long exposure and 
Shi opment, with the result that a 
be €-on-black figure will diffuse 
i ward into the field to the full ex- 
e of the radius of the blur circle. 
Po it is desired that concavities and 
ae be affected symmetri- 
lo ly, however (note that a psycho- 
Pe gion question requiring an empiri- 
E answer is thus raised), it is neces- 
bl y to resharpen the image into 

ack and white about some inter- 
mediate gray such that a linear con- 
our between black and white fields 
will be restored to its original posi- 
tion.® This may be accomplished 
with the aid of a suitable test-figure. 


of Dinneen (13) has succeeded in program- 
ands a digital computer to perform averaging- 
al -resharpening operations of almost ex- 
ctly this sort. His paper, which contains 
Copious illustrations of the effect of varying 
resharpening threshold, is recommended to 
the reader who finds the above discussion in- 
sufficiently informative. 
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Over a wide range of values on the 
resharpening threshold parameter, 
the process of blurring and resharpen- 
ing will decrease the dispersion 
(P?/A, or D) of any shape except a 
circle, which is already the most com- 
pact shape possible. For any such 
value, dispersion will tend to decrease 
as amount of blur increases, but the 
form of this function—which we shall 
call a blur-response function—will 
vary with the shape involved and 
will describe certain important char- 
acteristics of the shape. Since the 
decrease in the function is associated 
with the “washing out” of progres- 
sively larger detail as the blur circle 
increases in size, any sharp drop indi- 
cates that the shape contains con- 
siderable detail of a magnitude indi- 
cated by the blur circle at that point. 
The blur-response function (or, per- 
haps better, its derivative) is thus a 
potential aid in the statistical evalua- 
tion of “magnitude of critical detail,” 
which Bitterman, et al., found to be 
of primary importance in determin- 
ing the identifiability of an impover- 
ished shape (8). A full exploration of 
the properties of such functions (par- 
ticularly in the case of shapes char- 
acterized by certain types of regu- 
larity, or redundancy) is beyond the 
scope of this paper; our purpose here 
is merely to suggest their feasibility 
and possible usefulness. One further 
point should be made, however; 
neither the blur-response function 
nor any other gestalt measure can 
possibly predict the relative identifi- 
ability of shapes except in a limited, 
statistical way. The kinds and de- 
grees of similarity which an impov- 
erished shape bears to all the other 
shapes with which it might be con- 
fused will clearly affect the difficulty 
with which it is identified (quite 
apart from any intrinsic properties 
it may have), and these similarities 
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may be evaluated, if at all, only by 
recourse to analytical measures. A 
particular detail in a shape may or 
may not be critical to identification, 
depending upon the specific discrimi- 
nations which identification requires. 

Gestalt measures, as defined earlier, 
all involve a reduction in the dimen- 
sionality of figures (sometimes, though 
not necessarily, to a single dimension) 
with a concomitant discarding of in- 
formation. The number of operations 
by means of which a shape may be 
“collapsed” to lower dimensionality 
is indefinitely large, as Selfridge (26) 
has recently pointed out. At the 
simplest level, for example, we may 
literally collapse a shape upon any 
spatial axis by plotting, as a function 
of distance along that axis, the thick- 
ness of the shape in the orthogonal 
dimension (26, Fig. 3). The axis in- 
volved need not even be linear; e.g., 
it might be a circle about the center 
of gravity of the shape (cf. Pitts and 
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McCulloch, 24). . 

Of all the conceivable physical 
measures of shape, analytical as well 
as gestalt, there are undoubtedly 
many that have little of no value 
from a psychophysical point of view. 
On the other hand, it appears ùn- 
likely that any single system of physi- 
cal measurement can be optimal for 
all psychophysical situations: in other 
words, we are suggesting that form 
perception involves a number of dif- 
ferent psychological mechanisms 
which function in a complementary, 
and to some degree overlapping, man- 
ner. Unfortunately, there is no quick 
and easy way to determine which 
physical measurements have greatest 
Psychological relevance; only experi- 
mentation can answer this question. 
The preceding discussion and review 
may at least serve, however, to allevi- 
ate somewhat the paucity of hypoth- 
eses which in the past has charac- 
terized this research area. 
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The appearance of Loevinger’s 
paper (2) on the attenuation paradox 
in test theory was the precipitating 
factor in the writing of this note.? In 
reacting to her development of the 
paradox (supposed lack of monotonic 
relationship between reliability and 
validity) certain biases concerning 
test theory and test statistics which 
the writer has held for several years 
were crystallized. 

Bias Number One. Let's forget our 
fixation on the normal curve in test 
theory. 

Bias Number Two. Let's use sta- 
tistics appropriate for rank-order, 
point distributions. 

In support of these biases the fol- 
lowing two arguments are offered: 

1. Test score distributions are 
rank-order, point distributions. The 

| underlying trait may or may not be 
continuously and normally distrib- 
uted, but such speculation is of no 
import. Psychological tests furnish 
rank-order information only and 
furthermore, we have few Prospects 
of obtaining devices of any other 
type. Criteria, on the other hand, 
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Personnel and Training Research Center. 
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is granted for reproduction, translation pub- 
lication, use, and disposal in whole or in part 
by or for the United States Government, 

* The writer is indebted to Drs. Robert 
Travers, John Leiman, and John Schmid, 
Jr., for critical reading of this manuscript. 
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may occasionally be continuously 
distributed and certain of these dis- 
tributions may be normal, but cri- 
teria also are more frequently in the 
form of tests, ratings, rankings, pass- 
fail, and other point distributions. 

2. If no assumption is made con- 
cerning the shape of the criterion 
distribution in the work of Loevinger 
(2), Brogden (1), and Tucker (4), 
there is no paradox. For example, if 
all items in a test have difficulty val- 
ues of .5 and if all intercorrelations 
of items are equal, the relationships 
contained in Table 1 between num- 


TABLE 1 


VALIDITY As A Monotonic FUNCTION 
OF RELIABILITY 


Item | Item Test Validity 
Inter. | Valid- | g ag 4s 90 
7 ny Items Items Items Items 
eak Items Items Items Items _ 
1/9 | W179 | 73 83 92 -96 
2/9 | Z775 | 85 ¿91 .96 .98 
3/9 | V35 | 90 195 <98 .99 
4/9 | VA | 94 197 .99 993 
5/9 | V579 | .96 .98 .991 .996 


6/9 
7/9 
8/9 


V6/9 | .97 
V7/9 | .98 
Vv8/9 


ber of items, item reliability, or level 
of interitem correlations, and validity 
of total scores are obtained. It is 
seen that the relationship between 
reliability and validity is monotonic- 

iscussion. In obtaining the above 
results the same assumption about 
item validity made by preceding 
writers was used, i.e., each item ex- 
cept for errors of measurement is 4 
true measure of the criterion. This 
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means that the validity of an item is 
the square root of its reliability. In 
the present case reliability is indi- 
cated by the phi coefficient between 
items in the test, and the validity is 
a point biserial between the item and 

true” score. The values in the table 
are those obtained by applying the 
usual formula for the correlation of 
sums. Please note that here and else- 
where, when the term “correlation” 
is used, a product-moment correla- 
tion is assumed. 

The reader may have difficulty vis- 
ualizing the shapes of these criterion 
distributions since the definition of 
true is tied to the concept of infinity. 
The amount of error isn’t great in 
any derivation, however, if one sub- 
stitutes a large number for infinity.’ 
One thousand items will give results 
reasonably comparable to infinity 
—ten thousand would be eminently 
safe—and the shape of the distribu- 
tion can actually be worked out. Suf- 
fice to say, however, that a criterion 
distribution, as defined for Table 1, 
will not be normal when all items 
have difficulty values of .5 unless 
item intercorrelations are zero. For 
the same item difficulty specifica- 
tions the distribution becomes rec- 
tangular when item correlations reach 
4 and becomes increasingly U shaped 
as item correlations increase from $ 
to 1.00. 

The importance of the assumption 
concerning normality of criterion dis- 
tribution is made clear in Table 2. 


2 The mathematician, H. T. Davis, made 
this suggestion in principle in a class at 
Indiana University in 1935-36. He stated 
that if mathematicians substituted a “very 
large number” for infinity in their calculations 
they would not obtain significantly different 
answers and their assumptions would have 
operational meaning. This suggestion seems 
peculiarly appropriate for test theory. For 
the latter theory the number does not need 
to be nearly as large as Dr. Davis envisioned. 


TABLE 2 


COMPARISON OF ITEM VALIDITIES WITH AND 
WITHOUT THE ASSUMPTION OF CRI- 
TERION DISTRIBUTION NORMALITY 


Com- 

Item Relia- Item Validities | parison 

bilities Values 
Tpbis 
as nae com- 

Teet Phi | Vres |vphi, | puted 
OF bis | OF fpbis from 

This 

10 063 316 251 251 
20 128 447 358 358 
30 194 548 440 438 
40 262 632 512 506 
50 333 707 577 566 
60 410 775 640 620 
70 493 837 702 669 
80 590 894 768 716 
90 713 949 844 759 


This table was constructed by first 
assuming certain item reliabilities 
stated in terms of the tetrachoric cor- 
relation. These values are in Column 
1. Column 2 contains the correspond- 
ing phi coefficients for the same items. 
Column 3 contains the item validi- 
ties, stated as continuous biserials, 
when the criterion is assumed to be 
a true, normally distributed measure 
of the function measured by the 
item. Values in Column 3 are com- 
parable to item validities used previ- 
ously by Brogden and Loevinger, 
Column 4 also contains item validi- 
ties, stated as point biserials, but the 
criterion is assumed to be the sum of 
an infinite number of items, of a 
given level of reliability, whose dis- 
tribution takes the shape dictated 
by their intercorrelations. Column 
5 also contains point biserials, but 
these were computed from the con- 
tinuous biserials in Column 3, which 
were based on an assumed normal 
distribution, by multiplying each by 
the expression 2/ 4/pq. Comparison 
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of Columns 4 and 5 shows how the 
assumption of normality attenuates 
item validities with the error becom- 
ing progressively larger with higher 
validities. 
Similar tables can readily be com- 
puted for other levels of item diffi- 
_ culty. Again, there is no paradox, 
but the criterion distributions are 
skewed as well as flat when item inter- 
correlations are greater than zero. 
The assumption of a normal distribu- 
tion of the criterion is not compatible 
with the mechanics of adding test 
items together. 


item diffi- 
the center 
S counters 
elations. 

gue that a 
ce classical 
ow for the 
ribution re- 


r the classical for- 
mulas to be applicable, The locus 


of the paradox can, however, be more 
precisely stated. In order for the rela- 
tionship between validity and reli- 
ability to hold, one cann 


ot keep con- 
stant both the form of Brid: 


the criterion 
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distribution and the distribution of 
difficulties of the test items. 


Conclusion for test construction. , 


The test technician should proceed 
with the job of test construction with- 
out making obeisance to the normal 
curve. His decisions should be made 
in sequential fashion from most to 
least important. The shape of his 
test score distribution is a decision 
made late in the sequence and his de- 
sires about shape of distribution 
should not lead to reversals of earlier, 
more important decisions. It should 
also be noted that all of his decisions 
are made with a particular group of 
examinees in mind, since the level 
and range of their ability are crucial 
factors in the writing and selection of 
test items. : 
The first step in test construction 
is to draw up specifications for the 
test. Decisions made at this stage 
should not be changed, uncon- 
sciously, by later statistical computa- 
tions of the sort used in item analysis, 
Measures of test reliability, and 
measures of test homogeneity. Blind 
application of statistical procedures 
may change the nature of the test. 
_ For example, the test may be de- 
signed to predict a particular com- 
Plex criterion. Items will then be in- 
cluded in numbers such that their 
Weight in the total score will be opti- 
mum for the purpose. Selection of 
items on the basis of correlation of 
items against total test score woul 
obviously beinappropriate. A Kuder- 
Richardson homogeneity coefficient 
would also be inappropriate for the 
test as a whole. s 
One may also be interested in 
measuring a psychological “trait. 
In this case the tendency is to think 
of the problem in terms of the ho- 
Mogeneity of the items on the 
grounds that a heterogeneous test bY 
definition cannot measure a unitary 
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trait. If homogeneity is defined as 
level of item intercorrelations, how- 
ever, there is again the possibility of 
error in the blind following of statisti- 
cal indices. Let us suppose that a 
mechanical information test is de- 
sired. The following are possible ex- 
amples of such tests in descending 
order of item intercorrelations (dif- 
ficulty level of items being held con- 
stant): (a) Information about the 
crosscut saw and its use; (b) Infor- 
mation about saws and their uses; 
(c) Information about woodworking 
tools and their uses; (d) Information 
about tools and their uses in wood- 
working, plumbing, metal working, 
automotive repairing, etc. 

For many purposes test d may be 
most desirable, though its homogene- 
ity as defined above is lower than for 
the other tests in the series. This 
means that the test specifications 
must indicate how broadly this test 
should be defined. Even a fairly 
broad test may be relatively homo- 
geneous, however, in that the itemsin 
the test may still be more like each 
other than items in other tests in the 
same battery (3). 

High item reliability is always de- 
sired. Nothing is gained from low 
item reliabilities. The reader must 
remember, however, that item relia- 
bility is defined as the correlation 
with another comparable item, and 
is not estimated from correlations 
with all other items in the test. Hence 
there is no contradiction between the 
present advice to achieve high item 
reliability and that given above 
which was to select a desired degree 
of homogeneity. The test constructor 
should, therefore, as his next step, 
write the most reliable items he pos- 
sibly can for the function he wants to 
measure. By necessity, though not 
from choice, item reliabilities will 
often be quite low because reliable 


measurement in many areas is diffi- 
cult. 

One cannot be as dogmatic about 
high test reliability as about high 
item reliability. Test reliability is a 
function of item reliabilities and item 
intercorrelations; i.e., test reliability 
is in part a function of homogeneity. 
High test reliability can be achieved 
by narrowing the focus of the test 
and attaining high homogeneity. 
Care must be exercised in item selec- 
tion, therefore, not to confuse item 
reliability and homogeneity and 
thereby change the function meas- 
ured by the test. The test technician 
must maintain his original specifica- 
tions in spite of temptations to in- 
crease test reliability. 

The next decision concerns the 
shape of the distribution of test 
scores desired. Depending on the 
purpose of the test, the desired dis- 
tribution may be normal, platykurtic, 
skewed, or U shaped. For a general 
purpose test the writer submits that 
a rectangular distribution is most 
useful since this distribution most 
accurately represents the information 
furnished by a psychological test. 
That is, the rank-ordering of persons 
is accomplished equally well in all 
parts of the range when the distribu- 
tion is rectangular. This means that 
reliability of discrimination is maxi- 
mized over all. 

The desired shape is achieved or, 
more commonly, approached by con- 
trolling the difficulty levels of the 
test items. Item difficulties alone are 
manipulated because previous deci- | 
sions have fixed the general level of 
item intercorrelations that are possi- 
ble. With high item intercorrela- 
tions, a constant level of item diffi- 
culty will produce a U-shaped distri- 
bution. As the variance of item diffi- 
culty increases, the peaks of the U- 
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shaped distribution will converge to 
the center of the distribution. 

The reader should be warned that 
some of the shapes of test score dis- 
tributions are highly theoretical in 
terms of the characteristics of items 
available for most measurement pur- 
poses. One practical outcome, how- 
ever, is to question the decision made 
automatically by most test con- 
structors to vary the difficulty levels 
of the items in the test. With low 
item intercorrelations of the sort ob- 
tained in most aptitude tests only by 
careful selection of the most reliable 
items at a constant level of difficulty 


can a rectangular distribution be 
approached, 


SUMMARY 


1. The attenuation paradox 
test theory is a result of the assump- 
tion made by previous writers of a 
continuous normal distribution of the 
criterion, 


) 2. There is no paradox if the cri- 


in 


terion distributions 
shape. If this is 
facto paradoxical, th 
the paradox is in o 
hold constant both 
criterion distribution and the distri- 
bution of item difficulties, 

3. The pervasive use of the as. 
sumption of continuous normal dis- 
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with variation in the distribution of item 
difficulties, number of items, and degree 
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dox in test theory, Psychol. Bull. 1954 
51, 493-504, ae 
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tributions in test theory and test 
statistics is questioned on grounds 
that test data are in the form of rank- 
order, point distributions. | 

4. The test technician should make 
decisions in constructing a test in a 
particular sequence. This sequence 
is as follows: 

a. Outline his test specifications. 
This will specify the desired degree 
of homogeneity (level of item inter- 
correlations) wanted in the test. High 
homogeneity is not necessarily de- 
sirable, 

b. Write the most reliable items 
Possible to measure the desired func- 
tion or functions. Items of low relia- 
bility are never desired. 

c. Do not always try to maximize 
test reliability, since the latter is a 
function both of item reliability and 
homogeneity. The desired degree of 
homogeneity should be maintained 
even if item-test correlations are low. 

d. Select the form of the raw score 

istribution of test scores desired. 

his can be any form, though a rec- 
tangular distribution is recommended 
for a general Purpose test. 

€. Strive to obtain the desired form 
of distribution by varying item dif- 
culties only. Previous and more im- 
Portant decisions have fixed the level 
of item intercorrelations which is the 


other determiner of shape of distribu- 
tion. 
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In the tracking task, or indeed in 
any task requiring adjustment to 
moving objects, the operator is often 
confronted with targets which, while 
Preserving their general directions, 
change their velocities. It is the ob- 
ject of this survey to describe the ex- 
perimental literature which deals 
with responses that are made to such 
accelerated motion. Of particular 
interest in relation to tracking be- 
havior is the extent of acceleration 
which must occur in order for it to 
be noticed. The tracker’s ability to 
match target velocities with his own 
movements must depend in part on 
his sensitivity to change in velocity. 
Other information, though less obvi- 
ously applicable may help in com- 
pleting the picture of how the op- 
erator responds to changing veloci- 
ties. 

All of the findings on the topic of 
response to target acceleration that 
the writer has been able to unearth 
are included in the present review. 
Actually, very few studies have dealt 
with this problem. Further, it was 
not always the primary focus of the 
investigation. For this reason, in dis- 
cussing a study scant consideration 
may be given to its major objective. 
Instead, the aspects which bear on 
the present subject matter will be 


emphasized. 


1 This research was supported by the United 
States Air Force under Contract No. 33(616)- 
2024 with the University of California, moni- 
tored by the Aero Medical Laboratory | of 
Wright Air Development Center. Permission 
is granted for reproduction, publication, use 
and disposal, in whole or in part by or for the 
United States Government. 


The making of a systematic evalu- 
ation of the present status and future 
possibilities of work on response to 
acceleration necessitates the locating 
of this problem within the more gen- 
eral framework of response to target 
motion. To do this it will be neces- 
sary to describe some aspects of stud- 
iesonconstant-velocity motion. How- 
ever, there is no intention of making 
the present survey broader than is 
shown by the title. Consequently 
such important topics as perceived 
motion from discrete stimuli, induced 
motion, one’s interpretation of his 
own relative motion, etc., will receive 
no mention. A fairly strict limitation 
of subject matter is mandatory be- 
cause the problems of perception and 
action in relation to motion include 
all of the variables of the stationary 
environment in addition to those in- 
troduced by motion. 


DESCRIPTION OF RELEVANT STUDIES 


The complication experiment. The 
classical complication experiment, 
with its ancestry in the personal 
equation of Bessel (2, p. 133), is the 
first of the situations in which the 
effect of acceleration of motion on 
judgment was studied. Wundt 1) 
using a complication pendulum, at- 
tempted to judge the position of the 
pointer at the sound of a bell stroke. 
In Wundt's arrangement, the pointer 
oscillated symmetrically about the 
straight-up position. Figure 1 shows 
a top-pointing pendulum at two posi- 
tions of its motion in the upper 
sketches. It was found by Wundt 
that during the positively accelerated 
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to about 21 per cent. In some cases, 
changes of as little as 2.5 per cent 
were detected significantly better 
than chance. Reducing the view- 
ing period to as short a time as 0.5 
sec. did not reduce the accuracy of 
judgments. However, presenting 
two targets which crossed before each 
changed velocity in identical fashion 
did elevate the threshold somewhat. 

Phenomenological description of har- 
monic motion. Two investigators 
have obtained Phenomenological re- 
ports based on the Presentation of 
harmonic motion. Metzger (17) used 
a technique in which a fixed light 
source threw shadows of moving ob- 
jects upon a translucent screen, on 
the other side of which was the S, 
One or more vertical rods mounted on 
a horizontal turntable Provided tar- 
gets, each rod giving rise to a shadow 
which moved from side to side in a 
sinusoidal fashion, Metzger found a 
great preference for continuous paths 
of motion. For example, after two 
shadows joined together, the two 


Present in- 
Cent investigations by 
Johansson (14), This investigator 
also used a shadow t 
junction with a tra 
However, he Paste 


eccentric drive. 
this study and of Johansson’s subse- 
quent major study (15) concerns the 
Problem of perceptual organization 
of all major elements in a field, rather 
than observations on acceleration. 


Nevertheless, there is a clear state- 
ment regarding the perception of ac- 
celerated motion (14, p. 32). 


When an O is shown this kind of motion 
passing through a homogeneous field, and is 
asked how the velocity of the object behaves 
if different parts of the path of motion are 
compared one with another, or in other words, 
if the velocity changes along the path, practi- 
cally without exception the same answer is 
received: the point moves slowly just at the 


turning point; but otherwise its velocity is 
constant. 


DISCUSSION 


General observations on studies re- 
ported. It is evident from the forego- 
ing survey that the information on re- 
sponse to acceleration of target mo- 
tion is meager. Nor are the small 
caches of knowledge strategically 
placed for either theoretical or practi- 
cal purposes, Above all, little as yet 
can be stated in quantitative terms. 

The statement by Johansson may 
be broadened to include motions 
other than harmonic in the following 
way: When target velocity changes 
gradually, a person can tolerate a 
good deal of such change without re- 
alizing that the speed is not constant. 

is generalization may prove to be 
of value in the development of a 
coherent point of view. A closely re- 
lated but not identical suggestion is 
that the operator's perceptual mech- 
anism integrates smoothly changing 
velocities over a considerable period 
of time. Evidence for this conclusion 
was obtained by the writer in his stud- 
ies of Prediction-motion, It is also the 
view of the writer that the early work 
on the complication pendulum may 
be explained Partially in terms con- 
sonant with the foregoing formula- 
tions. First, S must require some time 
after the instantaneous signal to be- 
come aware of it: an appreciable reac- 
tion time is one of the most predicta- 
ble aspects of behavior. Next, it is 
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clear that Sallows for his reaction time 
in making his judgment of sound-pen- 
dulum coincidence, otherwise his 
error would always be positive. This 
is not the case: during the positively 
accelerated phase of harmonic mo- 
tion, S judges the pendulum position 
to be an earlier one than is actually 
correct. It is hypothesized that S 
attempts to extrapolate backward to 
the extent of one reaction-time inter- 
val. If he uses the velocity existing 
during a brief period after the in- 
stantaneous signal for the operation, 
apparently being content to disre- 
gard the fact that the velocity is 
changing, the obtained results would 
be expected. In the phase of positive 
acceleration he appears to base his 
extrapolation upon velocities that 
have become too high for correct 
localization and in the phase of nega- 
tive acceleration upon velocities that 
have become too low. It should be 
pointed out that the same qualitative 
predictions would be made by assum- 
ing that S computes rates by instan- 
taneous differentiation rather than 
by integrating over a period of time. 
The quantitative data do not allow 
selecting between the alternatives. 
In any event, S acts very much as 
though he were unaware that the 
velocity is changing. 

The study by Hick, on the other 
hand, shows S to be an extremely ac- 
curate discriminator of velocities, one 
who can sometimes detect an increase 
of 2.5 per cent and who needs no more 
than 0.25 sec. either before or after 
the change to make the discrimina- 
tion. How may this description be 
reconciled with that of the uncritical 
S who can tell that harmonic motion 
is changing in velocity only at the 
periods near reversal of direction? 
The obvious difference between the 
stimulus conditions for the disparate 
observations is in the gradualness of 
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transition from one velocity to an- 
other. During a course of harmonic 
motion, acceleration is least just at 
the times that velocities are greatest, 
in the region of midcourse. Because 
there is so little relative acceleration 
in this region, it would be expected 
that it would go unnoticed. The re- 
verse holds true near the ends of the 
motion where the ratio of accelera- 
tion to velocity becomes high and 
finally approaches a value which is 
infinitely high. At the time of an in- 
stantaneous change of velocity, as 
introduced by Hick, acceleration is 
naturally infinitely high. 

Analysis of thresholds already de- 
termined. It may be profitable to 
consider the response to target ac- 
celeration in the general context of 
research on the perception of motion. 
This approach should be particularly 
useful in clarifying the language and 
and problems in the determination of 
thresholds. A systematic analysis of 
thresholds both obtained and obtain- 
able might accomplish several things. 
First, the very organization of the 
material should indicate the voids as 
well as the islands in our present 
knowledge. Second, it should de- 
lineate the operations necessary for 
the obtaining of thresholds, and the 
primary variables whose values must 
be specified. Third, it should reveal 
parallels among thresholds which 
have been studied and suggest extra- 
polations to kinds of motion beyond 
the scope of the original studies. A 
graphic technique and a system of 
notation were devised for conducting 
this analysis. 

The significant kinds of threshold 
relating to motion which are de- 
scribed in the literature are repre- 
sented in Fig. 2. The first has been 
called the threshold of motion (shown 
in 2A). Angular distance as a func- 
tion of time is represented in the 
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graph on the left. On the right, the 
function is that of angular velocity 
against time. Gordon (7), in a recent 
experimental study, definesthethresh- 
old of motion as the lowest detectable 
angular velocity (distinguishing it 
from the threshold of displacement, 
which is the smallest angular distance 
over which motion of a given rate 
may be detected). In place of motion, 
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which is a general word, the present 
writer would substitute velocity, the 
unit in which the threshold is meas. 
ured. Further, since this is an abso- 
ute threshold, it should be called ab- 


volute threshold of velocity, Represent- 


d in both graphs of Fig. 2A are three 
est motions:f,g, h. Motion fisshown 
s being more rapid than is necessary 
or detection, and Motion h is too 
low to be detected, Motion g (the 
ark line) is that whose velocity just 
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permits detection. In the graph on 
the left, the threshold is the slope of 
the g line or ds,/dt; in the graph on . 
the right, the threshold is shown by 
the height of the g line, or v. The 
motions shown have been equated in 
time but could as logically have been 
equated in distance. Also, the par- 
ticular time used is an arbitrary de- 
cision of the experimenter. Conse- 
quently, the value of either the fixed 
time or distance employed must be 
specified. Troland reports that the 
minimum values generally found for 
this threshold lie between 1’ and 
2'/sec. (19, p. 380). 

Whereas the foregoing threshold 
is an absolute threshold, indicative of 
S's accuracy in distinguishing be- 
tween motion and no motion, that 
represented in Fig. 2B is a difference 
threshold. It is a measure of how 
well S distinguishes between two mo- 
tons of different velocities. This 
Problem has not been studied in as 
fine detail as the absolute threshold. 
in the course of a series of investiga- 
tions by J. F, Brown on the percep- 
tion of motion, the study by Brown 
and Mize (4) furnishes an expression 
of the accuracy of Ss in equating the 
Velocities of two sets of moving 
Squares on endless belts, The pro- 
Portional difference required for dis- 
crimination (Weber fraction) given 
by the writers is 0.024. However, it 
should be noted that this value actu- 
ally refers to the constant error (or 
bias) obtained with the method of 
limits and so does not carry the im- 
plied meaning. If the standard and 
test motions are shown under the 
Same conditions, there should be no 
Over-all bias whatsoever, but this 
obviously does not mean that the 
matching is perfectly accurate. 

Another variety of study bearing a 
relation to difference threshold is 
that of the motion parallax cue of 
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distance discrimination. If two ob- 
jects move at the same linear veloc- 
ity, it is possible to tell which is 
nearer because it will have a higher 
angular velocity. A study by Graham 
et al. (11) shows that for two vertical 
needles moving horizontally at right 
angles to the line of regard, a differ- 
ence in distance of the needles from 
the S which gives rise to a differential 
angular velocity of about 30”/sec. 
will provide a threshold distance dis- 
crimination. 

The graphs for difference threshold 
of velocity, are shown in Fig. 2B. The 
axes carry the same meaning as in A. 
The dotted line, j, represents the 
standard motion; f, g, and k repre- 
sent three test motions which are 
respectively more divergent from the 
standard than is necessary for dis- 
crimination, just detectably different 
(at the threshold), and too much like 
the standard to be distinguished from 
it. On the left, the threshold is the 
difference in slopes between test line 
g and standard line j, or ds,/dt—ds;/ 
dt. On the right the threshold is 
shown by the difference in height be- 
tween lines g and j or ¥,—v;. In addi- 
tion to the specification of time of 
presentation, it is evident that as the 
standard velocity is arbitrary, it too 
must be specified. As the graphs are 
merely illustrative no attempt has 
been made in Fig. 2 (or Fig. 3) to 
maintain equivalent scales on the 
left and right sides. 

A somewhat related experiment 
should be mentioned. This is the field 
study by Biel and G. E. Brown (1), in 
which Ss were asked to estimate the 
linear velocities of various airplanes 
during their courses of motion. Low 
velocities were overestimated and 
high ones were underestimated. The 
S's knowledge of the performance 
characteristics of the several types of 
aircraft used had considerable influ- 


ence on his judgments. Such a study 
could be represented in Fig. 2B by 
showing the actual rate of the plane 
as the dotted line, S’s mean judg- 
ment as a solid line, and his variabil- 
ity as a zone about his mean judg- 
ment. It also would be necessary to 
have the y axis represent linear in- 
stead of angular distance. 

As may be seen in Fig. 2C, the Hick 
experiment on the detection of in- 
stantaneous change of velocity may 
be represented formally in much the 
same manner as experiments on the 
difference threshold (Fig. 2B). The 
standard motion is now shown to pre- 
cede the test motions. Of course, on 
any one trial only one of the alterna- 
tives follows the standard. As the 
times of presentation of the standard 
and test motions may be varied inde- 
pendently of one another, both must 
be specified in addition to the veloc- 
ity of the standard motion. 

Thresholds of acceleration. As was 
pointed out in the previous discus- 
sion, the discrepancy between Hick’s 
results and those of investigators us- 
ing harmonic motion could be at- 
tributed to difference in the response 
to smooth change in velocity and to 
discontinuous change. As far as 
thresholds of acceleration are con- 
cerned, further work with harmonic 
motion would appear to be of limited 
value as this motion is a single com- 
plex case in which the extent of ac- 
celeration runs the gamut during each 
cycle. Also, all higher orders of deriv- | 
atives are present as well as accelera- 
tion. The detection of instantaneous 
change, although dealing with simple 
linear velocities (except at the point 
of change) is also a special case in 
which acceleration takes on an in- | 
finite value. The general case for | 
study would be that in which there | 
was a constant amount of accelera- | 
tion during a motion. Test motions 
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with different amounts of accelera- 
tion could then be compared. This 
would parallel the work on the abso- 
lute and difference thresholds of 
velocity. The kind of motion to be 
studied is necessarily that which is 
represented by the equation s=nt 
+m’, the equation which provides 
for aconstant amount of acceleration. 
(Here s is usually measured in angu- 
lar units and ¢ in seconds of time.) 
No description has been reported 
of an experimental determination of 
threshold of acceleration, although 
Hick and Bates (13) report an im- 
pression gained from preliminary in- 
vestigation that rate must double 
every five seconds for acceleration to 
be noticed. Several different experi- 
mental procedures suggest themselves 
for obtaining absolute threshold of 
acceleration. Some of these may be 
mentioned. First, there could simply 
be judgment by S for each of the 
test motions as to whether the mo- 
tion was accelerated. Second, stand- 
ard constant-velocity motion could 
be presented in paired trials with the 
various test 
to decide which member 


eral test motions will be used which 
differ j 


In Fig. 3A the graphs represent a 
determination of the absolute thresh- 
old of acceleration, where each test 
motion is compared with a constant- 
velocity standard, which is shown by 
the dotted line. As in Fig. 2, the 
three solid lines, f, g, and } represent 
motions which are above threshold, 
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just at the threshold, and below the 
threshold. In the graph on the left, 
the threshold is represented by the, 
second derivative of the function of 
the line g, d*s,/dt?. In the graph on 
the right the threshold is the slope 
of line g, dv/dt. 

It may be noted that a manipula- 
tion was possible in this experiment 
which was not possible in the de- 
termination of the thresholds of 
velocity: the motions differ only in 
one respect, acceleration, but are the 
same in time and distance. When 
experimenting upon the velocity 
thresholds, the test velocities are 
naturally different. But also if the 
motions are equated in time they 
must differ in distance and vice versa. 
Comparison may be made between 
the right-hand side of Fig. 3A and 
the left-hand side of Fig. 2A. In 
both, the equations are seen to be 
linear, and the statements of thresh- 
old are parallel, dv/dt as compared 
With ds,/dt. The intersection of all 
the lines at the same point in Fig. 3A 
shows how it was possible to equate 
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velocity. A similar arrangement in 
the case of Fig. 2A would simply 
mean that the motions would center 
about the same position, a considera- 
tion which is irrelevant in the de- 
termination of thresholds. As far as 
required specifications for the abso- 
lute threshold of acceleration are 
concerned, since acceleration may be 
varied independently of both time 
and distance, the particular time- 
distance combination used must be 
specified rather than only one or the 
other as in the case of the absolute 
threshold of velocity. 

Also as yet undetermined is the dif- 
ference threshold of acceleration. 
What would be desired is a measure 
of the accuracy with which S is able 
to distinguish between one extent of 
acceleration and another. In Fig. 3B 
the operations involved in determin- 
ing such a threshold are shown. A 
motion with a standard acceleration 
is shown by the dotted line and the 
usual three test motions by the solid 
lines. As in the case of absolute 
threshold, it is possible to equate the 
motions in both time and distance. 
In the graph on the left, the threshold 
is equal to the difference between the 
second derivatives of the functions 
represented by lines g, and j: d’s,/dt 
—d?s;/dé. In the graph on the right 
the threshold is equal to the differ- 
ence in slopes of the g and j lines: 
ds,/dt—ds;/dt. The same parallels 
and differences exist between the 
right-hand side of Fig. 3B and the 
left-hand side of Fig. 2B as in the 
comparison made for Fig. 3A and 
Fig. 2A. As in the case of absolute 
threshold of acceleration, the particu- 
lar time-distance combinations used 
must be specified. In addition, the 
value of the arbitrary standard ac- 
celeration employed must be stated. 

The question may have occurred 
to the reader of whether it is really 


worth while to determine a difference 
threshold of acceleration. After all, 
there is no end to the order of deriva- 
tives of motion. Certainly at some 
point there must be an end to the 
utility of determinations of absolute 
and relative thresholds. Perhaps the 
real question concerns the kinds of 
discriminations the operator can 
make. Evidently, if values of the 
third derivative of distance are suff- 
ciently high, it too may be detected; 
the intuitive term “jerk” has been 
applied to this characteristic by 
mechanical engineers. Perhaps it is 
beyond this point that the human 
operator has insignificant ability to 
discriminate. 

The constancy problem. One very 
important consideration has been 
slighted in all of the preceding discus- 
sion. It is that thresholds have been 
described in angular terms whereas 
an approximately linear path of mo- 
tion is probably more typical than a 
circular one. In the case of the abso- 
lute threshold of velocity the value 
for any given linear situation may be 
rather accurately specified in angular 
terms. This is because the arc which 
would be subtended is so small (often 
less than one second) that the angular 
rate is essentially constant through- 
out. Circular motions have been used 
predominently in this work. In the 
studies of difference threshold of 
velocity, the paths are necessarily 
longer. Linear paths have been used 
in most of the studies. Obviously the 
angular velocity must vary from 
point to point. Yet the statements 
of threshold are usually given in an- 
gular terms. The reason for this is 
clear. It is thus in order that the 
threshold may be stated independ- 
ently of S’s distance from the moving 
object. In the same way, it may 
prove to be of more importance or 
interest to determine thresholds of 
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acceleration for targets moving in 
linear rather than circular paths. The 
same solution would appear to be 
necessary; a threshold would be 
stated in average angular value for 
the course of motion. 

The very fact that constant-veloc- 
ity linear motion is seen as such de- 
serves comment. It is a constancy 
phenomenon in the same sense as size 
constancy; equal linear extents are 
judged as equal when at different 
distances and thus differing in angu- 
lar extent. In the present case, the 
extent judged equal is linear velocity 
rather than linear distance. This 
point is not the same as that made 
by Johansson, (14, p. 255) who refers 
to the previously mentioned percep- 
tion of harmonic motion in which the 
velocity is taken to be unchanging 
for the greater extent of the motion 
as an example of constancy. There 
is no equating of equal linear extents 
but rather an inability to discrimi- 
nate among different velocities 
whether considered as linear or as 
angular. 

Constancy of motion was one of 
the problems investigated by J. F. 
Brown (3). His method was to have 
an S match velocities of moving ob- 
Jects which were at different dis- 
tances, a procedure which corre- 
sponds exactly to the typical experi- 
ment on size constancy. It should be 
remarked that within each of the mo- 
tions in this experiment there must 
necessarily be the single-object con- 
stancy already identified; each ob- 
ject, although taking on a Tange of 
values of angular velocity, appears 
to move at a constant linear velocity. 

A parallel may be found in judg- 
ments of static magnitudes. Shape 
Constancy can also be looked upon as 
a type of single-object constancy. 
here is a constancy situation even 
when a large square is put at some 


distance from the observer and di- 
rectly before him in the frontal plane. 


The angular distances are necessarily- 


less at the two sides and at the top 
and bottom than in the center region. 
To the writer’s knowledge the ex- 
istence of a constancy phenomenon 
of objects so situated has neither been 
studied nor mentioned previously. 
When two objects are compared in 
the experiment on size constancy, 
there also exists the single-object con- 
stancy (of shape) within each of the 
figures. 

A related point on single-object 
velocity constancy is that not only 
does a target which is moving parallel 
with the ground change its angular 
velocity, but it also changes its angu- 
lar elevation. Angular elevation is 
low when the target is far off and 
high when it is near. The fact that 
it is seen as maintaining a level path 
could be called single-object con- 
stancy of direction. It would be of 
interest to know whether and to what 
extent a tracker is influenced by his 
tendency to perceive objects as mov- 
ing in a world of rectangular coordi- 
nates when his controls (such as 
cranks and hand-wheels) operate 
from angular inputs. 

No matter what aspect of the prob- 
lem of response to target motion is 
examined, it will be evident that far 
less has been done than remains to 

e done. Circular motion has been 
studied in some situations but not 
in others, similarly linear motion. 
There have been a few rather special 
Studies of harmonic motion. How- 
ever, motion paths of greater com- 
Plexity and in three dimensions have 
attracted no investigation. The pau- 
city of research on responses to ac- 
celerated motion and the absence 
even of discussion on higher order 
derivatives of motion has already 
been mentioned. The psychology of 
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response to target motion lies in the 
future. 


SUMMARY 


The experimental literature on re- 
sponses to acceleration of target mo- 
tion was reviewed. One significant 
observation was that smoothly ac- 
celerated motion is generally re- 
sponded to as if the velocity were 
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constant. Suggestions were made of 
a basic approach toward obtaining 
thresholds of acceleration. Examples 
of studies on constant velocity mo- 
tion were included in order to develop 
a systematic graphic method of de- 
scribing experiments on motion. The 
phenomenon of velocity constancy of 
a single moving target was identified 
and generalized. 
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TRANSFORMED STATISTICS FOR USE IN 
TEST CONSTRUCTION 
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In most test construction situa- 
tions it is desirable, if not absolutely 
necessary, to select from the test 
items which are available either (a) 
those which contribute most to test 
reliability, or (6) those which have 
the strongest relationship to an ex- 
ternal (criterion) variable, or else 
(c) those items which to some extent 
meet both requirements a and b. In 
any case, a relatively large number 
of item-test or item-criterion statis- 
tics will usually be required in order 
to identify the items which will com- 
prise the best final test, and the en- 
suing computation can be very labori- 
ous. A number of writers (1, 2, 4, 6) 
have reported on the merits of group- 
ing the test or criterion distributions 
into a relatively small number of 
symmetrical categories for the pur- 
pose of simplifying the computation 
of such item statistics. The chief 
advantage of such coarse grouping is 
the increased economy in time spent 
on computation, which at the same 
time is accompanied by a minimum 
loss of information. There appears 
to be no readily available literature 
containing formulas which are both 
economical to apply and at the same 
time utilize highly efficient grouping. 
This paper is intended partially to 
remedy this need. 

It is well known that when fre- 
quency distributions are grouped 
into broad categories, the informa- 
tion lost decreases the efficiency of 
statistics computed from such data. 
It can be shown, however, that the 
loss is less for some kinds of divisions 
into categories than it is for others, 
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ing scores into symmetrically ar- 
ranged categories is relatively effici- 
ent when there are as many as five 
or seven categories. For example, 
seven categories containing, from low 
to high scores, the percentages of 
cases, 4, 8, 25, 26, 25, 8 and 4, for 
which the corresponding new scores, 
—%, =2, —4, 0, 12 and 3, respec- 
tively, have been assigned will yield 
a (maximum) variance due to dif- 
ferences between categories of nearly 
95 per cent. The maximum variance 
between categories for five categories, 
if scored —2, —1, 0, 1, and 2, occurs 
when they contain, respectively, 9, 
20, 42, 20 and 9 per cents of cases; it 
is about 91 per cent. Traditionally 
much item selection has been carried 
out using distributions which have 
been divided, as recommended by 
Kelley (4), into only three categories 
Containing 27 per cent low, 46 per 
cent middle and 27 per cent high 
Scores. In this case, the variance be- 
tween categories is only 81 per cent. 
"or a given number of categories, 
moderate variation in percentages of 
Cases assigned to different categories 
lowers the Maximum less than might 
be expected, 

Although a large number of item- 
test or item-criterion relationships 
may be required, only a measure of 
the relative strength of such relation- 
Ships is most often needed. Because 
of this fact, and because of empirical 
evidence indicating high accuracy, 
as well as high efficiency, for coarse 
grouping methods (2), it would seem 
worthwhile in most test-construction 


Flanagan (2) has shown that group- 
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problems to follow Flanagan’s recom- 
mendations: first to include a few 
additional cases to offset loss in effi- 
ciency, and then to apply a coarse 
grouping transformation. 


APPLICATIONS OF A PARTICULAR 
TRANSFORMATION 


Test (or criterion) means, and 
numbers of subjects Ñ, and items 7 
are invariants under the kind of area 
transformations discussed above. In 
any problem, because of the sym- 
metrical nature of the new scores, 
the means become zero; and » and 
N, being independent of the cate- 
gories, are constants of the trans- 
formation. The variance of the new 
scores is of course constant for any 
particular set of categories, even 
though it has been relatively increased 
by the coarse grouping (as well as 
absolutely reduced because of the 
smaller range of the new scores). 
Similarly, correlation coefficients com- 
puted from coarsely grouped scores 
are attenuated, so that, for example, 
if required item-test or item-criterion 
relationships are the typical point- 
biserial r's, then a correction should 
be applied. 

It is possible, however, to choose 
an efficient set of categories which at 
the same time contains proportions 
of cases such that the correction for 
coarse grouping is implicit in the 
formulas. A set possessing this com- 
putational advantage is one contain- 
ing 9, 19, 44, 19, and 9 per cents of 
cases, the new scores being, respec- 
tively, 2, 1,0, —1 and —2. The be- 
tween-categories variance for this 
transformation is 90.5 per cent, which 
is almost the maximum obtainable for 
5 categories. Formulas are given be- 
low for the more useful statistics after 
this particular transformation has 
been applied to test and criterion 


distributions. 


If the transformed test scores are 
2,1, 0, —1 and —2, then the covari- 
ance Cyr of the original scores of test 
T with item i becomes the trans- 
formed covariance, 


Cap! =(2e+f—- g—2h)/N=Dir/N. [1] 


In [1] the frequencies of a (preferred 
or correct) response for item 7 for 
papers assigned scores 2, 1, —1 and 
—2 are, respectively, e, f, g, and h. 
Subsequently D’s such as Dir will 
always refer to differences like the 
one in parentheses in [1], and primes 
will always indicate other trans- 
formed quantities. 

Next we write the item-test point- 
biserial correlation, 


rir=krir', [2] 


where & is the correction for grouping 
the test scores into the broad cate- 
gories. Assuming that the original 
test scores T are approximately nor- 
mally distributed, the value of k can 
be shown (5, pp. 393-402) to be 
1.051. The standard deviation for 
the chosen score set, 2, 1,0, —1 and 
—2, to which correspond categories 
containing percentages of cases 9, 
19, 44, 19, and 9, is Sr’ = 1.049. Using 
these two values and [1] and [2], the 
item-test correlation, rir = Cir/S:Sr, 
is transformed as follows, 


Gir i( A 1.051 Dir 
S:S \SSr’] 1-049 NS; 
= Dir a [3 
NS: 


Statistics such as Ss, the standard de- 
viation of t, whith app = eed 
alone, are unaffected by the trans- 
poem oe [3]. Setting 1.051/1.049 
equal to 1.000 (instead of 1.002) in 
[3] introduces an error which for the 
present problem is negligible. ; 
Solving [3] for Cir, we now obtain 
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Csxr=SrDir/N. [4] 


Replacing subscript T by subscript 
C in the foregoing equations gives 
analogous transformed values for a 
criterion distribution C; for example, 
expressions analogous to [3] and [4] 
are 


Cie/SiSo=Dic/NSi, [5] 
and 
Cic =ScDic/N, [6] 
respectively. 


From a well-known relation (3) the 
variance Vp of a test T containing n 
items may be written as the sum of 
the n item-test covariances, Using 
this fact and summing [4] for n items, 


Vr=2Cir=Sr2D;r/N. [7] 


Dividing [7] by Sp, an estimate of the 
standard deviation of the original 
test scores is obtained, 


Sr=2Dr/N, [8] 


as a function of the item counts de- 
fined for use in [1]. Similarly, the 
square of [8] gives an estimate of the 
variance of the original scores, 

The validity coefficient, or correla- 
tion of test T with criterion C, may 
be written 


rro=Cro/StSc=2Ci¢/SpSo. [9] 


Substituting [8] and 2Cic, which is 
[6] summed for n items, in [9], 


tro=2D;ic/2D;r. [10] 
Thus [10] is the test validity esti- 
mated solely from item counts, 

For item-selection purposes it is 
often required that the criterion cor- 
relation of an item 7 significantly ex. 
ceed zero before including it in an 
experimental test. One way of 
achieving this is to include ż only if 


Cic2SiSc2/./N, [11] 
where 2=1.96, or some larger value 


> . 
of the normal deviate corresponding 
to a known level of significance. Sub- 
stituting [6] in [11], 


Dic=SiaV/N. [12] 


Use of [12] as an item selection coridi- 
tion has been discussed previously 
(6), where it was noted that setting 
S;=.5, the maximum value, provides 
a conservative statistical test which 
has the practical effect of insuring 
(a) that the test will contain (statisti- 
cally) valid items, (b) that most of 
the selected items will have large 
variances, 

Finally, it should be noted that 
[10] can also be written in another 
way, namely, 


tre= k?rro' 


= (1.051)227'C’/(1.049)2N 
=2T'C'/N. [13] 


In [13] 2 is the double correction for 
grouping both the T and C scores 
into the same-sized broad categories, 
and Sp’ = So’ = 1,049. The final ap- 
Proximation in [13], achieved by set- 
ting (1.051)?/(1.049)2 equal to unity 
(instead of 1.004), is still close enough 
for our purposes, 

In some test-construction problems 
it may be easier to obtain the sum of 
transformed cross products DT7’C’ 
and use [13] than it would be either 
to compute variances from original 
Scores or to obtain additional item 
counts for the purpose of estimating 
the validity by using [10]. For ex- 
ample, suppose it is required to con- 
struct an experimental “criterion- 
Specific” test by applying [12] to a 
Pool of items. After Papers have been 
grouped according to their criterion 
Scores C, and items for which [12] 
holds have been selected,! it is practi- 

1 The papers should be marked at the time 


[12] is applied to indicate later to which eri- 


terion distribution Category they belong. 
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scally always then necessary to eo 
tain the variance, validity, and re- 
liability of the test comprising the 
selected items. These values can im- 
mediately be approximated, with- 
out first having to tally raw test 
scores or to obtain item-test relation- 
ships, by using [13] as follows. 

First square [10], solve for (2Dir)’, 
and substitute the latter in the square 
of [8] to obtain an expression for the 
original variance, 


Vr=(2Dic)*/(Nrre)- (14] 


Substituting [14] for the variance in 
the formula for the Kuder-Richard- 
son reliability formula 20, 


n [ (Nrro)’ ÈV: 
ror = (Dia 


- [15] 


n—1 


Summation terms in [14] and [15] are 
obtained by summing the D;c for 
items selected by [12], and the item 
variances V; are obtainable as usual 
from the total item counts, also avail- 
able after using [12]. The validity 
coefficient is needed and can be esti- 
mated by [13]; once computed it may 
then also be used in [14] and [15]. 

To obtain ZT’C’ for use in [13], re- 
group the papers into the 5 cate- 
gories, this time according to their T 
scores, fill in the 5X5 contingency 
table for frequencies of the T’ and C’ 
scores (center categories may be ig- 
nored), and sum the 16 kinds of cross 
products. After the papers have been 
reordered according to T scores, the 
remaining operations take only a few 
minutes, even when there are a large 
number of subjects. 

It should be emphasized that the 
accuracy of the formulas is immedi- 
ately dependent upon fulfillment of 
the normality assumptions concern- 
ing the original test and criterion dis- 
tributions. In the case of item sta- 
tistics, departures from normality 
are, for reasons already discussed, 


not likely to be serious; however esti- 
mates such as [10], [13], [14], and {15] 
depend upon normality assumptions 
for two distributions and therefore 
in practice should be regarded only 
as rapidly obtainable approximations 
to the actual values. 


AN EXAMPLE 


In a large-scale research a subtest 
comprising 33 true-false personality 
inventory items was for several rea- 
sons of theoretical interest. The 
items were taken from a larger mas- 
culinity-femininity factor scale, and 
appeared to measure “‘fantasy, sensi- 
tivity and esthetic interest,” and pos- 
sibly also some kind of “neurotic con- 
flict.” The KR-20 reliability of the 
33 items was .71. 

The 33 item subtest, hereafter re- 
ferred to as “X,” was scored for a 
new random sample of 200 college 
women. Statistics for the obtained 
distribution corresponding to the 
first four moments were X=19.555, 
Vx=21.587, gi=—.1851 and g 
= —.4366. Although the distribution 
appeared slightly flattened and nega- 
tively skewed, test ratios for gı and 
go (— 1.080 and — 1.280, respectively) 
offered no evidence that the popula- 
tion distribution was anormal. 

Papers were divided according to 
X scores into the five categories rec- 
ommended above, and item counts 
for 636 other true-false items were 
obtained (see [1]). Application of 
[12] with s=2.58 and S;=.5 selected 
89 of the new items as potential cor- 
relates of X. (Since z was chosen to 
correspond to the 01 level, only 
about 6 would be expected by chance 
alone). 

At the same time that item counts 
were obtained for the new items, 
counts were also obtained for the 33 
items in X. Since the reliability of X 
was only .71, it was not expected that 
its items would all correlate well with 
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the total score; indeed, applying con- too large, but y 


which is stil] close- 
dition [12] would retain only 22 of enough for a quickly computed ap- 
proximation. The sum of variances 
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EARLY USE OF THE TERM 


VICARIOUS TRIAL AND ERROR (VTE) 


Th their recent article in this Jour- 
nal on ‘Vicarious trial and error and 
related behavior” (2), Goss and 
Wischner say that ‘‘to this general 
pattern of behavior Muenzinger and 
Fletcher have given the name ‘vicari- 
ous trial-and-error,’ abbreviated 
‘VTE’.” The origin of this term should 
have been ascribed to Muenzinger 
and Gentry. In the article referred to 
by Goss and Wischner I say so (4, p. 
89), but it is possible that I was not 
explicit enough. 

It was Evelyn Gentry (now Evelyn 
G. Hooker) who made the first study 
of the phenomenon in an experiment 
designed for this purpose and not as 
a by-product of other experiments. 
Her results were described in 1930 in 
an M.A. thesis under my direction 
(1) in which the term vicarious trial 
and error with its abbreviation VTE 
were used, and in which reference was 
made to earlier descriptions of this 
kind of behavior by other experi- 
menters. 

Our criterion for recording VTE in 
any one trial was then and still is “a 
a: into one alley before the other 
ea sg or wrong, was en- 

At first my co-workers and I 
thought that the presence of dis- 
criminanda within the choice alleys 
was the necessary condition for the 
occurrence of VTE. However, this 
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view almost immediately turned out 
to be wrong because we observed (in 
1929) that VTE also occurred when 
the choice alleys contained no dis- 
criminanda. In this case an animal 
had to make a left or a right turn in 
conjunction with the presence or 
absence of a tone that was sounded 1 
meter above the choice point (3). 
We realized that it was the mere pres- 
ence of the choice alleys that pro- 
duced VTE. 

We have always emphasized the 
role of experimental conditions in the 
relative frequency of VTE. To illu- 
strate, our observations throughout 
the years have invariably shown that 
as compared with no shock the pres- 
ence of electric shock after the point 
of choice is accompanied by more 
VTE (3, p. 78). We have also found 
invariably that in a difficult discrimi- 
nation situation the frequency of 
VTE is higher than in an easier one 
(3, p. 81). 

We have always assumed that a 
relation between VTE and learning 
efficiency exists. This was in line 
with the notion prevalent 30 years 
ago that actual trial and error is in- 
dispensable in certain types of learn- 
ing. But we have also stated ex- 
plicitly that “we have not demon- 
strated a causal relationship between 


the two” (3, P- 84). 
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ERRATUM 


In “The Water-Jar Einstellung Test as a Measure of Rigidity,” by Eugene 
E. Levitt, Vol. 53, No. 5, September, 1955, p. 368, right-hand column; 
a a E years of research, evidence for the validity of the 
ter-jar test as a measure of validity is still lacking.” 
Read: ‘‘1. After eight years of research, evidence for the validity of the 
water-jar test as a measure of rigidity is still lacking.” 
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